Hi!
Let’s say my app consumes data from an external API which returns up to around 5k entities per request.
General structure of an entity in api response looks like this:
{
"uuid": "1234",
"parent_uuid": "4321",
"name": "SomeName",
...
}
Saving those in Dgraph calls for an upsert.
It is trivial to write an upsert mutation for a single entity i.e:
upsert {
query {
t as var(func: eq(xid, $uuid)) {uid}
p as var(func: eq(xid, $parent_uuid)) {uid}
}
mutation {
set {
uid(t) <name> "SomeName" .
}
}
mutation @if(gt(len(p),0)) {
set {
uid(p) <link> uid(t) .
}
}
}
Saving 5k of those entities would require running this block 5k times.
The other way (and that’s how I do it right now) is to first run a query and fetch all present target and parent nodes and then generate one huge NQUAD mutation, replacing blank nodes with matches where necessary and adding RDFs for links.
Which approach is better in terms of Dgraph ways of doing things? Which is more performant?
Does Dgraph optimize 5k runs of the upsert above if it happens under single transaction?