Very slow and abborted Upserts

Frank · April 20, 2021, 11:59am

We read Records from Kafka (~50mio events) and want to upsert them in dgraph.
Our approach is this:
We use the Java Client and send JSON Upserts (up to 500 Customers per JSON):

query {cust0 as var(func: eq(Customer.id, "ABC12")) 
cust1 as var(func: eq(Customer.id, "XYZ12")) 
}
[ {
	"uid": "uid(cust0)",
	"cust.id": "ABC12",
	"cust.name": "Horst",
	"cust.lastname": "Müller",
	"dgraph.type": "Customer",
	"Customer.address": {
		"uid": "uid(cust0)",
		"Address.plz": "98755",
		"Address.ort": "London",
		"Address.strasse": "Main Street 1",
		"dgraph.type": "Address"
	},
	"Customer.contactdata": {
		"uid": "uid(cust0)",
		"Contactdata.text": "123",
		"Contactdata.typ": "CELLPHONE",
		"dgraph.type": "Contactdata"
	}
}, 
{
	"uid": "uid(cust1)",
	"Customer.id": "ABC12",
	"Customer.vorname": "Horst",
	"Customer.nachname": "Müller",
	"dgraph.type": "Customer",
	"Customer.adressen": {
		"uid": "uid(cust1)",
		"Address.plz": "12345",
		"Address.ort": "Paris",
		"Address.strasse": "Rue 1",
		"dgraph.type": "Address"
	},
	"Customer.contactdata": {
		"uid": "uid(cust1)",
		"Contactdata.text": "123",
		"Contactdata.typ": "MOBILE",
		"dgraph.type": "Contactdata"
	}
}]

We send the Request like this (query and mutation from above):

request = Request.newBuilder()
        .setQuery(query)
        .addMutations(newBuilder().setSetJson(copyFromUtf8(mutationJson))
            .build())
        .setCommitNow(true)
        .build();
AsyncTransaction aTxn = asyncClient.newTransaction();
    try {
      aTxn.doRequest(request);
    } finally {
      aTxn.discard();
    }

We have up to 500 Upsert-Records per Tx and we limited our application to sending only 2 parallel Txs to dgraph.

At the start we see avg timings about 20ms per record. But the Txs are getting slower and slower and are finally aborting.

We use the official standalone Helm-Chart (dgraph-single) with no modifications.

Any ideas how we could improve the throughput? We need to insert 50mio Customers (live or bulkloader are no options for us) and we are already facing problems with < 50k Customers.

This is an example of the latency after ~50Txs:

Statistics for 500 records:
parsing_ns: 1464296520
processing_ns: 125619898671
encoding_ns: 38998
assign_timestamp_ns: 62248811
total_ns: 127348347613
avg: 254 ms/req

For comparision: At the beginning the numbers were much better:

Statistics for 500 records:
parsing_ns: 472924261
processing_ns: 18814226338
encoding_ns: 53539
assign_timestamp_ns: 82250874
total_ns: 19375671982
avg: 38 ms/req

Our monitoring tools are showing >450k application-goroutines in dgraph-alpha at peek! Alpha used up to 6,4 GB of Heap.

ahsan · April 20, 2021, 12:17pm

Hey @Frank, Which version of Dgraph are you using?

Frank · April 20, 2021, 12:20pm

v20.11.3

Frank · April 20, 2021, 6:23pm

our monitoring tools also show that the “GO managed memory” poolname=Stack, pid=27 increases from 7MB to 1,9GB and then, after the Txs are aborted, falls back to 7MB.

Also: “go runtime system call count: 10k”
“Go to C (cgo) call count: 9k”
Parked Worker threads: 4 (instead of 136 during times with no update Txs)
Global Goroutine run queue size: 9k

Topic		Replies	Views
DGraph Times Out Processing Graph Dgraph dgraph , investigate , status:accepted , area:performance	26	1013	November 13, 2019
How to speed up using java client to upsert massive data Dgraph	6	952	January 16, 2020
What is the limit of upsert？ Dgraph dgraph , area:upsert	3	703	September 28, 2020
Throughput Issues with Dgraph ^v1 (currently running the nightly through docker) Users	11	855	May 10, 2018
Regarding very less throughput in dgraph Badger	2	747	May 29, 2020

Very slow and abborted Upserts

Related topics