How to improve dgraph cluster performance?

I have two dgraph clusters, both deployed v20.07 version dgraph using k8s.

Cluster1 Cluster2
nodes 6 1
Memory of each node 32GB 1TB
hard disk HDD HDD
groups 2 3
replicas 3 1
Data size of each group 8GB、9.5GB 961GB、800GB、1.1TB

The data of Cluster2 contains all the data of Cluster1, When I run the same query like this:

{
    xx(func: type(twitter_user)){
        count(uid)
    }
}

They returned the same result:{"count":41652230}, but it took 9s for cluste1 and 20s for cluster2.

And cluster2 still has 580GB of free memory. In cluster1, the free memory of the machine that run the query is close to 200MB, and the node that does not run the query is close to 10GB.

I don’t understand why cluster2 is so slow. Is it because the data in a single group is too big? Should I add more groups to cluster2?

HDD might be the problem.

Dgraph uses Badger underneath, which is a KV store that takes advantage of SSD. (The main website used to say this, now it’s gone.)

Someone from core team might help with speeding up your existing setup.

P.S. As a general rule of thumb, think of SSD as a cheaper ram and not costly storage.

1 Like

Now I have cleared the data of cluster2 and imported the same data as cluster1, but the query is still twice as slow as cluster1. The read and write speed of hard disk in cluster2 is much faster than that in cluster1. :joy:

Pinging our performance gurus to have a look @ibrahim @ashishgoswami

1 Like

The sever latency information.

server latency Cluster1 Cluster2
processing_ns 5924ms 8437ms
encoding_ns 6496ms 12363ms

Hi @Valdanito,

We would need some clarifications about questions.

  1. When you are saying 6 nodes, we are assuming that you are only mentioning alpha nodes and not including zero nodes.
  2. For cluster2, how did you arrive at situation where there is only single node(again assuming its alpha) and it has 3 groups.

I also have concerns around group sizes. Assuming first two sizes belongs to cluster1: total size would be 17.5 GB, which is very less than total size of cluster2: >1 TB.

@ashishgoswami Thank you for your reply.

  1. The cluster1 has 6 server machines, 6 alpha, 2 groups
  2. The cluster2 has only one server machine, but it has a lot of memory. I run 3 alphas(3 groups) on it, all with the help of k8s.
  3. Later, to verify the impact of group sizes on performance, I changed cluster2 to two alphas (two groups), and imported the same data as cluster1, but the performance of culster2 did not improve.
  4. Finally, I try to use docker to run only one dgraph alpha on cluster2 machine, and it’s query speed is only a little faster than that of cluster2.

Now, the good news is that group sizes do not significantly reduce the performance of dgraph.
The bad news is that I still can’t find the reason why cluster2 is not as good as cluster1. I just find that the encoding_ns of cluster2 is relatively large.