Dgraph mutations are accepted in RDF
N-Quad and JSON formats. To load CSV-formatted data into Dgraph, first convert
the dataset into one of the accepted formats and then load the resulting dataset
into Dgraph. This section demonstrates converting CSV into JSON. There are
many tools available to convert CSV to JSON. For example, you can use
d3-dsv
’s csv2json
tool as shown below:
Name,URL
Dgraph,https://github.com/dgraph-io/dgraph
Badger,https://github.com/dgraph-io/badger
$ csv2json names.csv --out names.json
$ cat names.json | jq '.'
[
{
"Name": "Dgraph",
"URL": "https://github.com/dgraph-io/dgraph"
},
{
"Name": "Badger",
"URL": "https://github.com/dgraph-io/badger"
}
]
This JSON can be loaded into Dgraph via the programmatic clients. This follows
the JSON Mutation Format.
Note that each JSON object in the list above will be assigned a unique UID since
the uid
field is omitted.
The Ratel UI (and HTTP clients) expect JSON data to be stored within the "set"
key. You can use jq
to transform the JSON into the correct format:
$ cat names.json | jq '{ set: . }'
{
"set": [
{
"Name": "Dgraph",
"URL": "https://github.com/dgraph-io/dgraph"
},
{
"Name": "Badger",
"URL": "https://github.com/dgraph-io/badger"
}
]
}
Let’s say you have CSV data in a file named connects.csv that’s connecting nodes
together. Here, the connects
field should uid
type.
uid,connects
_:a,_:b
_:a,_:c
_:c,_:d
_:d,_:a
To get the correct JSON format, you can convert the CSV into JSON and use jq
to transform it in the correct format where the connects
edge is a node uid:
$ csv2json connects.csv | jq '[ .[] | { uid: .uid, connects: { uid: .connects } } ]'
[
{
"uid": "_:a",
"connects": {
"uid": "_:b"
}
},
{
"uid": "_:a",
"connects": {
"uid": "_:c"
}
},
{
"uid": "_:c",
"connects": {
"uid": "_:d"
}
},
{
"uid": "_:d",
"connects": {
"uid": "_:a"
}
}
]
You can modify the jq
transformation to output the mutation format accepted by
Ratel UI and HTTP clients:
$ csv2json connects.csv | jq '{ set: [ .[] | {uid: .uid, connects: { uid: .connects } } ] }'
{
"set": [
{
"uid": "_:a",
"connects": {
"uid": "_:b"
}
},
{
"uid": "_:a",
"connects": {
"uid": "_:c"
}
},
{
"uid": "_:c",
"connects": {
"uid": "_:d"
}
},
{
"uid": "_:d",
"connects": {
"uid": "_:a"
}
}
]
}
This is a companion discussion topic for the original entry at https://dgraph.io/docs/howto/loading-csv-data/