What I do wrong that my write performance is so bad

Hi everyone!

We are adapting our production environment into to the dGraph style.
We have cluster with replication of 3 on r5d.2xlarge AWS instances (64GB RAM, 8 cores, SSD) - 3 aplhas and 3 zeros - one on each node.

When I try this mutation it takes about 27s for response (from js client, also with curl from localhost).

This is my schema.

I tried it also in several serial requests, but one request takes about 1s and it finish also about 28s.

What I can change to get better write speed?

What version of Dgraph are you running and what’s the Zero and Alpha config set up? Are all these instances in the same region?

My cluster run in same region in 3 availability zones: eu-west-1a, eu-west-1b, eu-west-1c.

I used version 1.0.10 and also 1.0.11-rc4.

Commands for zero are:

docker run -d --name=zero --hostname=1.zero.weave.local --restart unless-stopped -v /dgraph/zero/w:/dgraph/zw -p 6080:6080 -p 5080:5080 dgraph/dgraph:v1.0.11-rc4 dgraph zero --my=1.zero.weave.local:5080 --idx 1 --replicas 3

and for alphas:
docker run -d --name=alpha --hostname=1.alpha.weave.local --restart unless-stopped -v /dgraph/export:/dgraph/export -v /dgraph/alpha/w:/dgraph/w -v /dgraph/alpha/p:/dgraph/p -p 8080:8080 -p 9080:9080 dgraph/dgraph:v1.0.11-rc4 dgraph alpha --export=/dgraph/export --my=1.alpha.weave.local:7080 --lru_mb 21504 --zero 1.zero.weave.local:5080 -p /dgraph/p --idx=1 -w /dgraph/w --badger.vlog=disk --whitelist 172.17.0.0:172.20.0.0 --bindall=true --custom_tokenizers=/dgraph/plugins/nfd.so

I ran a cluster and added your schema and mutation (thanks for sharing). I also compiled your nfd tokenizer recently shared in a GitHub issue for the nfd index.

I ran a 1 Zero/1 Alpha cluster locally with v1.0.10 and v1.0.11-rc4, ran the schema udpate, and then the mutation. Both the schema update (2 secs) and the mutation (1 sec) finished quickly.

Is there a reason to set this when you said your set up is with SSDs?

Can you clarify what you mean here? What happened for the 1s request and for the 28s request? Is 28s for the total time taken to do the several requests in serial?

my test env has only 16GB of RAM and there was issue with replication… but when I try to run it whit this command:

docker run -d --name=alpha --hostname=1.alpha.weave.local --restart unless-stopped -v /dgraph/export:/dgraph/export -v /dgraph/alpha/w:/dgraph/w  -v /dgraph/alpha/p:/dgraph/p -p 8080:8080 -p 9080:9080 dgraph/dgraph:v1.0.11-rc4 dgraph alpha --export=/dgraph/export --my=1.alpha.weave.local:7080 --lru_mb 21504 --zero 1.zero.weave.local:5080 -p /dgraph/p --idx=1 -w /dgraph/w --badger.vlog=mmap --badger.tables=ram --whitelist 172.17.0.0:172.20.0.0 --bindall=true --custom_tokenizers=/dgraph/plugins/nfd.so

the response time was same.

Yes, 28s is total time for all requests in serial.

Can you share the server_latency numbers in your responses? There are three of them: parsing_ns, processing_ns, and encoding_ns.

How I can get it in Javascript client?

BTW, I try to convert my JSON to RDF file and check how fast it will goes with dgraph live, but it is slower.

Processing mutation.rdf.gz
[    2s] Txns: 0 RDFs: 0 RDFs/sec:     0 Aborts: 0
[    4s] Txns: 0 RDFs: 0 RDFs/sec:     0 Aborts: 0
[    6s] Txns: 1 RDFs: 1000 RDFs/sec:   167 Aborts: 0
[    8s] Txns: 1 RDFs: 1000 RDFs/sec:   125 Aborts: 0
[   10s] Txns: 1 RDFs: 1000 RDFs/sec:   100 Aborts: 1
[   12s] Txns: 1 RDFs: 1000 RDFs/sec:    83 Aborts: 1
[   14s] Txns: 1 RDFs: 1000 RDFs/sec:    71 Aborts: 2
[   16s] Txns: 1 RDFs: 1000 RDFs/sec:    62 Aborts: 2
[   18s] Txns: 1 RDFs: 1000 RDFs/sec:    56 Aborts: 3
[   20s] Txns: 1 RDFs: 1000 RDFs/sec:    50 Aborts: 4
[   22s] Txns: 1 RDFs: 1000 RDFs/sec:    45 Aborts: 4
[   24s] Txns: 1 RDFs: 1000 RDFs/sec:    42 Aborts: 5
[   26s] Txns: 1 RDFs: 1000 RDFs/sec:    38 Aborts: 5
[   28s] Txns: 1 RDFs: 1000 RDFs/sec:    36 Aborts: 6
[   30s] Txns: 1 RDFs: 1000 RDFs/sec:    33 Aborts: 6
[   32s] Txns: 2 RDFs: 2000 RDFs/sec:    62 Aborts: 6
[   34s] Txns: 2 RDFs: 2000 RDFs/sec:    59 Aborts: 6
[   36s] Txns: 2 RDFs: 2000 RDFs/sec:    56 Aborts: 7
[   38s] Txns: 2 RDFs: 2000 RDFs/sec:    53 Aborts: 7
[   40s] Txns: 2 RDFs: 2000 RDFs/sec:    50 Aborts: 7
[   42s] Txns: 2 RDFs: 2000 RDFs/sec:    48 Aborts: 8
[   44s] Txns: 2 RDFs: 2000 RDFs/sec:    45 Aborts: 9
[   46s] Txns: 2 RDFs: 2000 RDFs/sec:    43 Aborts: 9
[   48s] Txns: 2 RDFs: 2000 RDFs/sec:    42 Aborts: 10
[   50s] Txns: 2 RDFs: 2000 RDFs/sec:    40 Aborts: 10
[   52s] Txns: 2 RDFs: 2000 RDFs/sec:    38 Aborts: 11
[   54s] Txns: 2 RDFs: 2000 RDFs/sec:    37 Aborts: 11
[   56s] Txns: 3 RDFs: 3000 RDFs/sec:    54 Aborts: 11
[   58s] Txns: 3 RDFs: 3000 RDFs/sec:    52 Aborts: 11
[  1m0s] Txns: 3 RDFs: 3000 RDFs/sec:    50 Aborts: 12
[  1m2s] Txns: 3 RDFs: 3000 RDFs/sec:    48 Aborts: 13
[  1m4s] Txns: 3 RDFs: 3000 RDFs/sec:    47 Aborts: 13
[  1m6s] Txns: 3 RDFs: 3000 RDFs/sec:    45 Aborts: 14
[  1m8s] Txns: 3 RDFs: 3000 RDFs/sec:    44 Aborts: 14
[ 1m10s] Txns: 3 RDFs: 3000 RDFs/sec:    43 Aborts: 15
[ 1m12s] Txns: 3 RDFs: 3000 RDFs/sec:    42 Aborts: 15
[ 1m14s] Txns: 3 RDFs: 3000 RDFs/sec:    41 Aborts: 15
[ 1m16s] Txns: 4 RDFs: 4000 RDFs/sec:    53 Aborts: 15
[ 1m18s] Txns: 4 RDFs: 4000 RDFs/sec:    51 Aborts: 16
[ 1m20s] Txns: 4 RDFs: 4000 RDFs/sec:    50 Aborts: 16
[ 1m22s] Txns: 4 RDFs: 4000 RDFs/sec:    49 Aborts: 17
[ 1m24s] Txns: 4 RDFs: 4000 RDFs/sec:    48 Aborts: 17
[ 1m26s] Txns: 4 RDFs: 4000 RDFs/sec:    47 Aborts: 18
[ 1m28s] Txns: 5 RDFs: 4537 RDFs/sec:    52 Aborts: 18
[ 1m30s] Txns: 5 RDFs: 4537 RDFs/sec:    50 Aborts: 18
[ 1m32s] Txns: 5 RDFs: 4537 RDFs/sec:    49 Aborts: 19
[ 1m34s] Txns: 5 RDFs: 4537 RDFs/sec:    48 Aborts: 19
[ 1m36s] Txns: 5 RDFs: 4537 RDFs/sec:    47 Aborts: 20
[ 1m38s] Txns: 5 RDFs: 4537 RDFs/sec:    46 Aborts: 20
[ 1m40s] Txns: 6 RDFs: 5537 RDFs/sec:    55 Aborts: 20
[ 1m42s] Txns: 6 RDFs: 5537 RDFs/sec:    54 Aborts: 20
[ 1m44s] Txns: 6 RDFs: 5537 RDFs/sec:    53 Aborts: 20
[ 1m46s] Txns: 6 RDFs: 5537 RDFs/sec:    52 Aborts: 21
[ 1m48s] Txns: 6 RDFs: 5537 RDFs/sec:    51 Aborts: 21
Number of TXs run         : 7                                                                       
Number of RDFs processed  : 6537
Time spent                : 1m48.420856484s
RDFs processed per second : 60

I do not know whether it matters, but my dataset has already about 125 000 000 of nodes.
But CPU and RAM is ok.

You can get server latency numbers from the Response#extensions.server_latency field.

Response#extensions is nil for me, but I .getLatency return this: 14097299, 27136539745

I tried cluster of 3 nodes with replication of 1, but writing speed is same - about 30s. Why? Why sharding does not help?

I took another look at your schema. A number of predicates have a count index, which are expensive for writes. If the count indexes aren’t required for your use case, then your mutation response times should improve significantly without them.

1 Like