Many small mutations vs one large. Best usage patterns

igrekun · September 17, 2021, 9:34am

Hi!

Let’s say my app consumes data from an external API which returns up to around 5k entities per request.

General structure of an entity in api response looks like this:

{
  "uuid": "1234",
  "parent_uuid": "4321",
  "name": "SomeName",
  ...
}

Saving those in Dgraph calls for an upsert.

It is trivial to write an upsert mutation for a single entity i.e:

upsert {
  query {
    t as var(func: eq(xid, $uuid)) {uid}
    p as var(func: eq(xid, $parent_uuid)) {uid}
  }

  mutation {
    set {
      uid(t) <name> "SomeName" .
    }
  }

  mutation @if(gt(len(p),0)) {
    set {
      uid(p) <link> uid(t) .
    }
  }
}

Saving 5k of those entities would require running this block 5k times.

The other way (and that’s how I do it right now) is to first run a query and fetch all present target and parent nodes and then generate one huge NQUAD mutation, replacing blank nodes with matches where necessary and adding RDFs for links.

Which approach is better in terms of Dgraph ways of doing things? Which is more performant?

Does Dgraph optimize 5k runs of the upsert above if it happens under single transaction?

iluminae · September 17, 2021, 3:00pm

I have found that batching in many-thousands is pretty performant. I have batches of 1000 in my ingestion pipeline. It is a little awkward to make a bunch of unique variables in the query portion and use them below in the nquads, but once that is solved its fine to have a knob on arbitrary sized batches you can play with/optimize for your use case.

igrekun · September 17, 2021, 3:05pm

By batching you mean code-generating the query portion of request with thousands of variables and send that as a single transaction to Dgraph?

iluminae · September 17, 2021, 3:07pm

yea I currently have 1000 variables being resolved in the query{…} section and using them all in the set{…} section below. I hash the unique id of each thing to make its variable, and prefix the hash since a variable cannot start with a number.

igrekun · September 17, 2021, 3:09pm

Thanks will try that!

zhaojiangkun · April 20, 2023, 9:20am

Hi，I wonder the max threshold of {…}.
Or what is {…} associated with?
Or I wonder how many keys using one transaction in {…}.

Topic		Replies	Views
Batch upserts in dgo Dgraph kind:question , dgo , dgraph	3	552	March 15, 2021
Bulk copying edges concerns Dgraph mutation	1	495	April 1, 2020
Query and mutate in one request Dgraph	7	1314	June 16, 2020
Bulk/massive upserts feature Dev	6	467	August 22, 2019
What is best way to batch mutate and avoid node duplication? Users	5	813	May 31, 2018

Many small mutations vs one large. Best usage patterns

Related topics