Managing large upserts

Hi,

We’re currently using neo4j, but there are a lot of things about dgraph that make us want to make the switch. However, we are stuck trying to figure out the best way to manage large (> million node) upserts in an efficient way. We have a streaming system that takes in incremental updates to existing data. The updates key off of one node and then mutate the subgraph of nodes connected to the key node based upon the data content.

I see how I can do upserts with transactions, but all of the data needs to be loaded into the client, so the performance isn’t quite good enough for our use case. Is there any way to do upserts server-side? To make this a bit more concrete, here’s the (heavily simplified) cypher version of what we’re doing currently that we would like to replicate in dgraph:

Index:

CREATE CONSTRAINT ON (n:Person) ASSERT n.id IS UNIQUE

Query:

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS  FROM 'file:///{filename}' AS data
MERGE (n:Person {id: toIngeger(data.id) })
  ON MATCH SET n.last_login = data.last_login
  ON CREATE SET n.last_login = data.last_login, n.created = timestamp()

We’re looking for a ~50,000 nodes/second processing rate, which is what we get currently out of neo4j. Is there any way to do something similar with dgraph?

Thanks for the help!

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.