What I do wrong that my write performance is so bad

selmeci · December 6, 2018, 5:52pm

Hi everyone!

We are adapting our production environment into to the dGraph style.
We have cluster with replication of 3 on r5d.2xlarge AWS instances (64GB RAM, 8 cores, SSD) - 3 aplhas and 3 zeros - one on each node.

When I try this mutation it takes about 27s for response (from js client, also with curl from localhost).

This is my schema.

I tried it also in several serial requests, but one request takes about 1s and it finish also about 28s.

What I can change to get better write speed?

dmai · December 6, 2018, 8:57pm

What version of Dgraph are you running and what’s the Zero and Alpha config set up? Are all these instances in the same region?

selmeci · December 6, 2018, 9:20pm

My cluster run in same region in 3 availability zones: eu-west-1a, eu-west-1b, eu-west-1c.

I used version 1.0.10 and also 1.0.11-rc4.

Commands for zero are:

docker run -d --name=zero --hostname=1.zero.weave.local --restart unless-stopped -v /dgraph/zero/w:/dgraph/zw -p 6080:6080 -p 5080:5080 dgraph/dgraph:v1.0.11-rc4 dgraph zero --my=1.zero.weave.local:5080 --idx 1 --replicas 3

and for alphas:
docker run -d --name=alpha --hostname=1.alpha.weave.local --restart unless-stopped -v /dgraph/export:/dgraph/export -v /dgraph/alpha/w:/dgraph/w -v /dgraph/alpha/p:/dgraph/p -p 8080:8080 -p 9080:9080 dgraph/dgraph:v1.0.11-rc4 dgraph alpha --export=/dgraph/export --my=1.alpha.weave.local:7080 --lru_mb 21504 --zero 1.zero.weave.local:5080 -p /dgraph/p --idx=1 -w /dgraph/w --badger.vlog=disk --whitelist 172.17.0.0:172.20.0.0 --bindall=true --custom_tokenizers=/dgraph/plugins/nfd.so

dmai · December 6, 2018, 9:26pm

I ran a cluster and added your schema and mutation (thanks for sharing). I also compiled your nfd tokenizer recently shared in a GitHub issue for the nfd index.

I ran a 1 Zero/1 Alpha cluster locally with v1.0.10 and v1.0.11-rc4, ran the schema udpate, and then the mutation. Both the schema update (2 secs) and the mutation (1 sec) finished quickly.

Is there a reason to set this when you said your set up is with SSDs?

dmai · December 6, 2018, 9:27pm

Can you clarify what you mean here? What happened for the 1s request and for the 28s request? Is 28s for the total time taken to do the several requests in serial?

selmeci · December 14, 2018, 1:39pm

my test env has only 16GB of RAM and there was issue with replication… but when I try to run it whit this command:

docker run -d --name=alpha --hostname=1.alpha.weave.local --restart unless-stopped -v /dgraph/export:/dgraph/export -v /dgraph/alpha/w:/dgraph/w  -v /dgraph/alpha/p:/dgraph/p -p 8080:8080 -p 9080:9080 dgraph/dgraph:v1.0.11-rc4 dgraph alpha --export=/dgraph/export --my=1.alpha.weave.local:7080 --lru_mb 21504 --zero 1.zero.weave.local:5080 -p /dgraph/p --idx=1 -w /dgraph/w --badger.vlog=mmap --badger.tables=ram --whitelist 172.17.0.0:172.20.0.0 --bindall=true --custom_tokenizers=/dgraph/plugins/nfd.so

the response time was same.

selmeci · December 14, 2018, 1:40pm

Yes, 28s is total time for all requests in serial.

dmai · December 14, 2018, 5:37pm

Can you share the server_latency numbers in your responses? There are three of them: parsing_ns, processing_ns, and encoding_ns.

selmeci · December 14, 2018, 5:42pm

How I can get it in Javascript client?

BTW, I try to convert my JSON to RDF file and check how fast it will goes with dgraph live, but it is slower.

Processing mutation.rdf.gz
[    2s] Txns: 0 RDFs: 0 RDFs/sec:     0 Aborts: 0
[    4s] Txns: 0 RDFs: 0 RDFs/sec:     0 Aborts: 0
[    6s] Txns: 1 RDFs: 1000 RDFs/sec:   167 Aborts: 0
[    8s] Txns: 1 RDFs: 1000 RDFs/sec:   125 Aborts: 0
[   10s] Txns: 1 RDFs: 1000 RDFs/sec:   100 Aborts: 1
[   12s] Txns: 1 RDFs: 1000 RDFs/sec:    83 Aborts: 1
[   14s] Txns: 1 RDFs: 1000 RDFs/sec:    71 Aborts: 2
[   16s] Txns: 1 RDFs: 1000 RDFs/sec:    62 Aborts: 2
[   18s] Txns: 1 RDFs: 1000 RDFs/sec:    56 Aborts: 3
[   20s] Txns: 1 RDFs: 1000 RDFs/sec:    50 Aborts: 4
[   22s] Txns: 1 RDFs: 1000 RDFs/sec:    45 Aborts: 4
[   24s] Txns: 1 RDFs: 1000 RDFs/sec:    42 Aborts: 5
[   26s] Txns: 1 RDFs: 1000 RDFs/sec:    38 Aborts: 5
[   28s] Txns: 1 RDFs: 1000 RDFs/sec:    36 Aborts: 6
[   30s] Txns: 1 RDFs: 1000 RDFs/sec:    33 Aborts: 6
[   32s] Txns: 2 RDFs: 2000 RDFs/sec:    62 Aborts: 6
[   34s] Txns: 2 RDFs: 2000 RDFs/sec:    59 Aborts: 6
[   36s] Txns: 2 RDFs: 2000 RDFs/sec:    56 Aborts: 7
[   38s] Txns: 2 RDFs: 2000 RDFs/sec:    53 Aborts: 7
[   40s] Txns: 2 RDFs: 2000 RDFs/sec:    50 Aborts: 7
[   42s] Txns: 2 RDFs: 2000 RDFs/sec:    48 Aborts: 8
[   44s] Txns: 2 RDFs: 2000 RDFs/sec:    45 Aborts: 9
[   46s] Txns: 2 RDFs: 2000 RDFs/sec:    43 Aborts: 9
[   48s] Txns: 2 RDFs: 2000 RDFs/sec:    42 Aborts: 10
[   50s] Txns: 2 RDFs: 2000 RDFs/sec:    40 Aborts: 10
[   52s] Txns: 2 RDFs: 2000 RDFs/sec:    38 Aborts: 11
[   54s] Txns: 2 RDFs: 2000 RDFs/sec:    37 Aborts: 11
[   56s] Txns: 3 RDFs: 3000 RDFs/sec:    54 Aborts: 11
[   58s] Txns: 3 RDFs: 3000 RDFs/sec:    52 Aborts: 11
[  1m0s] Txns: 3 RDFs: 3000 RDFs/sec:    50 Aborts: 12
[  1m2s] Txns: 3 RDFs: 3000 RDFs/sec:    48 Aborts: 13
[  1m4s] Txns: 3 RDFs: 3000 RDFs/sec:    47 Aborts: 13
[  1m6s] Txns: 3 RDFs: 3000 RDFs/sec:    45 Aborts: 14
[  1m8s] Txns: 3 RDFs: 3000 RDFs/sec:    44 Aborts: 14
[ 1m10s] Txns: 3 RDFs: 3000 RDFs/sec:    43 Aborts: 15
[ 1m12s] Txns: 3 RDFs: 3000 RDFs/sec:    42 Aborts: 15
[ 1m14s] Txns: 3 RDFs: 3000 RDFs/sec:    41 Aborts: 15
[ 1m16s] Txns: 4 RDFs: 4000 RDFs/sec:    53 Aborts: 15
[ 1m18s] Txns: 4 RDFs: 4000 RDFs/sec:    51 Aborts: 16
[ 1m20s] Txns: 4 RDFs: 4000 RDFs/sec:    50 Aborts: 16
[ 1m22s] Txns: 4 RDFs: 4000 RDFs/sec:    49 Aborts: 17
[ 1m24s] Txns: 4 RDFs: 4000 RDFs/sec:    48 Aborts: 17
[ 1m26s] Txns: 4 RDFs: 4000 RDFs/sec:    47 Aborts: 18
[ 1m28s] Txns: 5 RDFs: 4537 RDFs/sec:    52 Aborts: 18
[ 1m30s] Txns: 5 RDFs: 4537 RDFs/sec:    50 Aborts: 18
[ 1m32s] Txns: 5 RDFs: 4537 RDFs/sec:    49 Aborts: 19
[ 1m34s] Txns: 5 RDFs: 4537 RDFs/sec:    48 Aborts: 19
[ 1m36s] Txns: 5 RDFs: 4537 RDFs/sec:    47 Aborts: 20
[ 1m38s] Txns: 5 RDFs: 4537 RDFs/sec:    46 Aborts: 20
[ 1m40s] Txns: 6 RDFs: 5537 RDFs/sec:    55 Aborts: 20
[ 1m42s] Txns: 6 RDFs: 5537 RDFs/sec:    54 Aborts: 20
[ 1m44s] Txns: 6 RDFs: 5537 RDFs/sec:    53 Aborts: 20
[ 1m46s] Txns: 6 RDFs: 5537 RDFs/sec:    52 Aborts: 21
[ 1m48s] Txns: 6 RDFs: 5537 RDFs/sec:    51 Aborts: 21
Number of TXs run         : 7                                                                       
Number of RDFs processed  : 6537
Time spent                : 1m48.420856484s
RDFs processed per second : 60

selmeci · December 14, 2018, 5:48pm

I do not know whether it matters, but my dataset has already about 125 000 000 of nodes.
But CPU and RAM is ok.

dmai · December 14, 2018, 7:20pm

You can get server latency numbers from the Response#extensions.server_latency field.

selmeci · December 14, 2018, 8:55pm

Response#extensions is nil for me, but I .getLatency return this: 14097299, 27136539745

selmeci · December 15, 2018, 4:53pm

I tried cluster of 3 nodes with replication of 1, but writing speed is same - about 30s. Why? Why sharding does not help?

dmai · December 19, 2018, 10:02pm

I took another look at your schema. A number of predicates have a count index, which are expensive for writes. If the count indexes aren’t required for your use case, then your mutation response times should improve significantly without them.

Topic		Replies	Views
Become unable to execute mutation suddenly in write-heavy workload Users	6	1076	July 22, 2018
Please help me setup HA cluster in Dgraph, no use Docker Dgraph	8	564	June 22, 2019
[DGraph Cluster] 600ms vs 6s: Query response times differ greatly based on the shard that is executing it Dgraph dgraph , kind:bug	4	626	August 29, 2022
Performance issue in cluster Dgraph	3	660	August 7, 2019
When writing data, dgraph takes up too much memory Dgraph area:performance	7	813	January 20, 2021

What I do wrong that my write performance is so bad

Related topics