Query performance of large database (over 12g edges)

We created very large Dgraph database using bulk loader which has over 12g edges, 1.2TB of p directory size. And we realized that even very simple query on indexed predicate takes over 4 minutes.

Is it reasonable time according to data size (limitation of Dgraph) or is there something wrong with our query or database?

Query example

{
  q(func:has(price), first: 10, orderdesc: price){
    uid
  }
}
--> takes 247sec

Count of predicate

{
  q(func:has(price)){
    count(uid)
  }
}

--> "count": 36479300, takes 17sec

Schema

 <price>              : int @index(int) .
...

Other information

HW: 64 vCPUs, 416 GB memory(n1-highmem-64 of Google Cloud Platform)
single instance
CPU usage: around 200% (full 2cpus out of 64 cpus)
Mem usage: around 19%
Disk IO: it looks almost no io according to GCP instance monitor.
→ It looks no resources are exhausted.

Dgraph version : v1.0.15, Commit SHA-1 : ff5ee1e2

go tool pprof output pdfs
cpu_1.pdf (40.4 KB) (30s profile just after query started)
cpu_2.pdf (22.0 KB) (30s profile a few minutes after query started)
heap.pdf (27.1 KB)

If you need additional information, please ask me.
Thank you for your kind reply in advance.

Please upvote/like this issue May be adding BTree index. This query is exactly, where query performance can be significantly better with my proposal.

1 Like

The has() function does not utilize an index. has() does iteration to find the nodes which have the specified edge.

To utilize the index utilize the appropriate indexed functions for your predicate type.

Thank you for reply.

I don’t fully understand your plan but It looks amazing so I did like your post.
I hope your improvement makes Dgraph more sophisticated database.

Hi dmai, thank you for reply.

The has() function does not utilize an index. has() does iteration to find the nodes which have the specified edge.

In this case, I just need 10 highest prices, so has() is basically redundant. I added it just because query root function is required. Is there any other way to write a query which returns same result?

Anyway, I read pprof result and a bit of Dgraph sources, I learned below things.

  • To write fast query against large database, We must sufficiently do narrowing result at query root function.
  • Ordered pagination doesn’t use index, so we also need to narrowing result before pagination.

Are these correct?

I don’t have a plan, but a feature request for Dgraph to have btree index, which should make sorted queries quick, like to get 10 highest prices in a btree index is very quick(matter of milliseconds(may be paar seconds on your data in worst case, I don’t know, but should be really quick) actually).