How to improve dgraph cluster performance？

Valdanito · September 18, 2020, 1:26pm

I have two dgraph clusters, both deployed v20.07 version dgraph using k8s.

	Cluster1	Cluster2
nodes	6	1
Memory of each node	32GB	1TB
hard disk	HDD	HDD
groups	2	3
replicas	3	1
Data size of each group	8GB、9.5GB	961GB、800GB、1.1TB

The data of Cluster2 contains all the data of Cluster1, When I run the same query like this:

{
    xx(func: type(twitter_user)){
        count(uid)
    }
}

They returned the same result：{"count":41652230}, but it took 9s for cluste1 and 20s for cluster2.

And cluster2 still has 580GB of free memory. In cluster1, the free memory of the machine that run the query is close to 200MB, and the node that does not run the query is close to 10GB.

I don’t understand why cluster2 is so slow. Is it because the data in a single group is too big? Should I add more groups to cluster2?

abhijit-kar · September 18, 2020, 5:42pm

HDD might be the problem.

Dgraph uses Badger underneath, which is a KV store that takes advantage of SSD. (The main website used to say this, now it’s gone.)

Someone from core team might help with speeding up your existing setup.

P.S. As a general rule of thumb, think of SSD as a cheaper ram and not costly storage.

Valdanito · September 28, 2020, 3:46am

Now I have cleared the data of cluster2 and imported the same data as cluster1, but the query is still twice as slow as cluster1. The read and write speed of hard disk in cluster2 is much faster than that in cluster1.

chewxy · September 28, 2020, 4:26am

Pinging our performance gurus to have a look @ibrahim @ashishgoswami

Valdanito · September 28, 2020, 10:08am

The sever latency information.

server latency	Cluster1	Cluster2
processing_ns	5924ms	8437ms
encoding_ns	6496ms	12363ms

ashishgoswami · September 28, 2020, 12:19pm

Hi @Valdanito,

We would need some clarifications about questions.

When you are saying 6 nodes, we are assuming that you are only mentioning alpha nodes and not including zero nodes.
For cluster2, how did you arrive at situation where there is only single node(again assuming its alpha) and it has 3 groups.

I also have concerns around group sizes. Assuming first two sizes belongs to cluster1: total size would be 17.5 GB, which is very less than total size of cluster2: >1 TB.

Valdanito · September 29, 2020, 1:48am

@ashishgoswami Thank you for your reply.

The cluster1 has 6 server machines, 6 alpha, 2 groups
The cluster2 has only one server machine, but it has a lot of memory. I run 3 alphas(3 groups) on it, all with the help of k8s.
Later, to verify the impact of group sizes on performance, I changed cluster2 to two alphas (two groups), and imported the same data as cluster1, but the performance of culster2 did not improve.
Finally, I try to use docker to run only one dgraph alpha on cluster2 machine, and it’s query speed is only a little faster than that of cluster2.

Now, the good news is that group sizes do not significantly reduce the performance of dgraph.
The bad news is that I still can’t find the reason why cluster2 is not as good as cluster1. I just find that the encoding_ns of cluster2 is relatively large.

Topic		Replies	Views
Performance issue in cluster Dgraph	3	660	August 7, 2019
How to plan the cluster size Dgraph kind:question , dgraph , area:operations	5	678	April 23, 2020
Slow performance on a single node with millions of documents Dgraph performance , area:performance	7	1774	August 24, 2020
About dgraph 20.11 query efficiency？ Dgraph	6	579	December 22, 2020
[DGraph Cluster] 600ms vs 6s: Query response times differ greatly based on the shard that is executing it Dgraph dgraph , kind:bug	4	629	August 29, 2022

How to improve dgraph cluster performance？

Related topics