Loading CSV Data - Howto

diggy · August 28, 2020, 5:13pm

Dgraph mutations are accepted in RDF N-Quad and JSON formats. To load CSV-formatted data into Dgraph, first convert the dataset into one of the accepted formats and then load the resulting dataset into Dgraph. This section demonstrates converting CSV into JSON. There are many tools available to convert CSV to JSON. For example, you can use d3-dsv’s csv2json tool as shown below:

Name,URL
Dgraph,https://github.com/dgraph-io/dgraph
Badger,https://github.com/dgraph-io/badger

$ csv2json names.csv --out names.json
$ cat names.json | jq '.'
[
  {
    "Name": "Dgraph",
    "URL": "https://github.com/dgraph-io/dgraph"
  },
  {
    "Name": "Badger",
    "URL": "https://github.com/dgraph-io/badger"
  }
]

This JSON can be loaded into Dgraph via the programmatic clients. This follows the JSON Mutation Format. Note that each JSON object in the list above will be assigned a unique UID since the uid field is omitted.

The Ratel UI (and HTTP clients) expect JSON data to be stored within the "set" key. You can use jq to transform the JSON into the correct format:

$ cat names.json | jq '{ set: . }'

{
  "set": [
    {
      "Name": "Dgraph",
      "URL": "https://github.com/dgraph-io/dgraph"
    },
    {
      "Name": "Badger",
      "URL": "https://github.com/dgraph-io/badger"
    }
  ]
}

Let’s say you have CSV data in a file named connects.csv that’s connecting nodes together. Here, the connects field should uid type.

uid,connects
_:a,_:b
_:a,_:c
_:c,_:d
_:d,_:a

Note To reuse existing integer IDs from a CSV file as UIDs in Dgraph, use Dgraph Zero’s assign endpoint before data loading to allocate a range of UIDs that can be safely assigned.

To get the correct JSON format, you can convert the CSV into JSON and use jq to transform it in the correct format where the connects edge is a node uid:

$ csv2json connects.csv | jq '[ .[] | { uid: .uid, connects: { uid: .connects } } ]'

[
  {
    "uid": "_:a",
    "connects": {
      "uid": "_:b"
    }
  },
  {
    "uid": "_:a",
    "connects": {
      "uid": "_:c"
    }
  },
  {
    "uid": "_:c",
    "connects": {
      "uid": "_:d"
    }
  },
  {
    "uid": "_:d",
    "connects": {
      "uid": "_:a"
    }
  }
]

You can modify the jq transformation to output the mutation format accepted by Ratel UI and HTTP clients:

$ csv2json connects.csv | jq '{ set: [ .[] | {uid: .uid, connects: { uid: .connects } } ] }'

{
  "set": [
    {
      "uid": "_:a",
      "connects": {
        "uid": "_:b"
      }
    },
    {
      "uid": "_:a",
      "connects": {
        "uid": "_:c"
      }
    },
    {
      "uid": "_:c",
      "connects": {
        "uid": "_:d"
      }
    },
    {
      "uid": "_:d",
      "connects": {
        "uid": "_:a"
      }
    }
  ]
}

This is a companion discussion topic for the original entry at https://dgraph.io/docs/howto/loading-csv-data/

Topic		Replies	Views
Loading CSV Data - Migration Documentation	0	466	January 28, 2021
Loading CSV data? Users	2	1158	May 4, 2019
CSV to RDF N-QUAD conversion for DGraph Users	7	837	May 9, 2019
JSON file mutation + query help Dgraph dataset , kind:question , area:live-loader	3	750	August 21, 2020
CSV importer or CSV converter to RDF - How to? Users	3	2162	April 16, 2018

Loading CSV Data - Howto

Related topics