Loading 50GB CSV data


(Shoukat Ghouse) #1

Hi, We are exploring dgraph and need to load 50GB csv file to dgraph.
Is there any utility to load csv directly? Any inputs?


(Michel Conrado) #2

nope

here a way to do it https://docs.dgraph.io/howto/#loading-csv-data

you could also export it to JSON and import via bulkload or liveloader


(yyyguy) #3

The ability to load CSV data is one of the basic pieces of functionality I look for when I have been researching graph database solutions. I was a bit surprised when I couldn’t find something to do that “out of the box”.

I overcame this by using NiFi and building a CSV to JSON FlowFile, and was able to load my dgraph instance with a 25GB CSV file. It worked great.

From my perspective, as Dgraph matures, and to make it more accessible/approachable, (like Neo4J) pre-built functions will be a necessity.


#4

@yyyguy: You have to be careful importing csv data from relational database, because there can be often “NULL” strings for missing values. I have not tried the csv parser mentioned above, but I assume that “NULL” would be interpreted as a string and thus imported later in the graph. If you take a look at the demo set of neo4j here you can see these NULL values.

In my opinion missing values should not be imported - I think this is one of the strengths of graph data over relational tables. For example when a customer does not have a fax number (it’s another question who uses today a fax number…:slight_smile: ) then there is no predicate for this customer.


(yyyguy) #5

@graphpivot I agree with your thoughts. I have dealt with the issue of not loading NULL values at the NiFi level. That way I can take a database-agnostic approach. I can then focus on the data load.

Again, thanks for your thoughts.