We created very large Dgraph database using bulk loader which has over 12g edges, 1.2TB of p directory size. And we realized that even very simple query on indexed predicate takes over 4 minutes.
Is it reasonable time according to data size (limitation of Dgraph) or is there something wrong with our query or database?
HW: 64 vCPUs, 416 GB memory(n1-highmem-64 of Google Cloud Platform)
single instance
CPU usage: around 200% (full 2cpus out of 64 cpus)
Mem usage: around 19%
Disk IO: it looks almost no io according to GCP instance monitor.
→ It looks no resources are exhausted.
Dgraph version : v1.0.15, Commit SHA-1 : ff5ee1e2
go tool pprof output pdfs cpu_1.pdf (40.4 KB) (30s profile just after query started) cpu_2.pdf (22.0 KB) (30s profile a few minutes after query started) heap.pdf (27.1 KB)
If you need additional information, please ask me.
Thank you for your kind reply in advance.
The has() function does not utilize an index. has() does iteration to find the nodes which have the specified edge.
In this case, I just need 10 highest prices, so has() is basically redundant. I added it just because query root function is required. Is there any other way to write a query which returns same result?
Anyway, I read pprof result and a bit of Dgraph sources, I learned below things.
To write fast query against large database, We must sufficiently do narrowing result at query root function.
Ordered pagination doesn’t use index, so we also need to narrowing result before pagination.
I don’t have a plan, but a feature request for Dgraph to have btree index, which should make sorted queries quick, like to get 10 highest prices in a btree index is very quick(matter of milliseconds(may be paar seconds on your data in worst case, I don’t know, but should be really quick) actually).