Loading 50GB CSV data

Hi, We are exploring dgraph and need to load 50GB csv file to dgraph.
Is there any utility to load csv directly? Any inputs?


here a way to do it Get started with Dgraph

you could also export it to JSON and import via bulkload or liveloader

The ability to load CSV data is one of the basic pieces of functionality I look for when I have been researching graph database solutions. I was a bit surprised when I couldn’t find something to do that “out of the box”.

I overcame this by using NiFi and building a CSV to JSON FlowFile, and was able to load my dgraph instance with a 25GB CSV file. It worked great.

From my perspective, as Dgraph matures, and to make it more accessible/approachable, (like Neo4J) pre-built functions will be a necessity.

@yyyguy: You have to be careful importing csv data from relational database, because there can be often “NULL” strings for missing values. I have not tried the csv parser mentioned above, but I assume that “NULL” would be interpreted as a string and thus imported later in the graph. If you take a look at the demo set of neo4j here you can see these NULL values.

In my opinion missing values should not be imported - I think this is one of the strengths of graph data over relational tables. For example when a customer does not have a fax number (it’s another question who uses today a fax number…:slight_smile: ) then there is no predicate for this customer.

@graphpivot I agree with your thoughts. I have dealt with the issue of not loading NULL values at the NiFi level. That way I can take a database-agnostic approach. I can then focus on the data load.

Again, thanks for your thoughts.

@yyyguy I know this was awhile ago, but I’m looking at doing something similar with Nifi and was curious about the details of how you loaded the Json file into Dgraph - e.g. did you invoke liveloader or use a Post step?