Which is the best batch upsert?

I want to upsert in batches and currently implement several methods:

there are 100,000 nodes and 240,000 edge relationships

Machine: 1 12-core cpu 16g memory, 1 alpha and 1 zero

  1. First query all the data that contains node and edges. There will be a query of 340,000 data, which takes 2.5 minutes. It is realized by assembling query statements, such as query1…query100000, and then judge whether to execute the old one based on the latest data. Delete the relationship and add the new relationship, then compare whether to update the node according to the content of the node, and finally update the edges that need to be deleted and the newly added edges, there are about 400,000 statements.(finaly it cost 7.5mins, and live import cost 5mins,if exist old data, it cost 2.5mins to query all data and 0~5mins to create or delete)

  2. Directly use the upsert statement(query1…query 100000 {many mutations}) to upload and update all node content and edges, then return all edges, compare and delete old edges.


  1. Which method has the best performance?

  2. In the first method of querying the full amount of data, is it possible to query in parallel? If I use parallel query, it will cause out ouf memory, so I can only check 1000 points and relationships for each query

  3. Is there room for optimization in query1…query100000?

pprof.dgraph.samples.cpu.007.pb (505.9 KB)

here is batch query pprof file

query example:
query13(func: eq(subnet.default.id, 1)) {
expand(all) {

I removed trace code(span.Annotatef) which was written in query code, finaly a batch query costs 80s better than 330s before

This is not optimized for the context you mentioned.