[Feature request] Support data import from CSV file

Moved from GitHub dgraph/4920

Posted by marvin-hansen:

Experience Report

A buddy asked me about a quick & simple way to setup a DB that can be integrated into an online system. So I suggested Dgraph. Setup easy, GraphQL schema easy, data import didn’t worked out.

What you wanted to do

I wanted to load a ~15 GB data sample from a ~200 GB data set, all in CSV format.
Because Dgraph doesn’t support CSV, the first attempt was to convert CSV into RDF.

Converting ~15GB into RDF, well, good luck. We got plenty out of memory errors with various tools.

Next best, convert to JSON mutation, so let’s do some Python scripting. Had to split the sample file into disjoint files to make it manageable…

https://docs.dgraph.io/mutations/#json-mutation-format

What you actually did

We selected a different DB because converting data simply took way too long, and quite frankly doesn’t scale so well to >100 GB. Nobody has time to script every damn data import from standard file format supported virtually everywhere.

Please support the existing standards out there to make everyone’s life better.

Why that wasn’t great, with examples

It’s self-evident.

What would be a truly great solution?

Support import from CSV files…

A simple command-line would be great.

Truly great would be a simple UI console that can import, export, and query data. For example, Heidi does just that. Nothing fancy, just import, export, query.

Any external references to support your case

https://www.heidisql.com/#featurelist

1 Like

Has there Been any update for this? I’m looking to import data into Dgraph and haven’t been able to find a solution yet.

I can’t imagine CSV importing becoming a priority given there are already two supported import formats.

In past imports of large amounts of data that I’ve accomplished, the conversion from various (non RDF, JSON) formats was non-trivial from a machine-spec perspective. In other words, we provisioned a high-memory VM in order to transform the data to RDF.

1 Like

The big problem with CSV is that it has no convention for Graphs. CSV Relationships in Neo4j are custom made. CSV was not made for that. You would need to create a standardized convention for everyone to follow. But how to make all CSV users standardize as we need?

We would also need to maintain a convention for Exports. Which would not be compatible with any other tool that uses CSV. Our JSON export isn’t compatible with any tool out there, but JSON is easy to sanitize .

Easier is to convert CSV to JSON or something. You can use openrefine https://openrefine.org/ - it’s a very good tool and has a Dgraph compatible export. You just need to sanitize the data well. Using OpenRefine itself.

1 Like

OKay, I understand. Thank you both @matthewmcneely @MichelDiz for your responses! :pray:t5:

1 Like