UPSERT versus MERGE - Loading Dgraph without creating upsert blocks for every record in the file

liveload
mutation

(yyyguy) #1

I have been using Dgraph recently to import CSV and JSON data. I am trying to ensure that I do not have duplicate values for some key reference data (e.g. currency values). I have multiple files that contain the CSV or JSON data.

With Neo4j, I can do upserts using a Cypher command called MERGE. It uses a constraint model that must be defined prior to loading the data (see below). Based on that constraint, any value that already exists will not be loaded.

// CURRENCY
CREATE CONSTRAINT ON (currency:Currency) ASSERT currency.currency_code IS UNIQUE;

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///Currency-Rates.csv' AS line
MERGE (currency:Currency {currency_code: line.Currency_Code});

With Dgraph, using an upsert block looks like this:

upsert {
  query {
    v as var(func: eq(currency.code, "CAD"))
  }

  mutation {
    set {
      uid(v) <currency.code> "CAD" .
      uid(v) <currency.name> "Canadian Dollar: .
    }
  }
}

So Dgraph will do an upsert based on the “uid”. When the upsert block queries and determines if a “uid” exists or not. it will execute the mutation or not.

So my question - Is there a way to use an input file and load Dgraph without creating upsert blocks for every record in the file?


(Aman Mangal) #2

You could try using live loader. It would be a little faster given that it keeps the UIDs on the client side.

In general, I do not see much different in both the approaches. With Upsert, you will have to write code for reading file and constructing queries and execute the transaction, for which you can use any language of your preference.


(yyyguy) #3

I have been using live loader. In my experience with live loader, it doesn’t prevent me from loading the same information in multiple times. Maybe it does within the same input file, but I can load the same file multiple times without live loader complaining about the duplicate records I have loaded.

From my perspective, it is a fundamental approach in making technologies approachable. How can I make it as straightforward as possible to use a tool? Initial loading of data is one thing. There is also the incremental adds and changes to the data. With how Neo4j has developed this functionality, I don’t have to create a separate set of logic (one for initial load, and another for adds and updates). The upsert functionality is built into MERGE.

With my experience with the two graph databases, I have been able to get up to speed more quickly with Neo4j. Loading data and building relationships has been a challenging exercise with Dgraph. However, I prefer the direction that Dgraph is going, so I want to stick with it.


(Michel Conrado) #4

Well, for this you can use the flag -x, --xidmap string Directory to store xid to uid mapping - With this flag the Live instance will keep a folder with blank nodes mapped to UIDs. So you can use as many times as you need. And it will prevent this from happening. If you use the same blank nodes always.


(yyyguy) #5

Thanks @MichelDiz. That is a helpful tip. I will use that. Cheers.

Edited: BTW, that simple parameter (-x xid) works great!


(Aman Mangal) #6

Thank you for your feedback, we will take this into account going further. Our goal generally has been to provide basic tools/features to the users to allow them build more complex and specific tools/features themselves. Upsert is also along the similar lines and allows you to do a lot more with a few lines of code.