I’m bulk-loading data into d-graph for an external source, as is. There’s a literal that uniquely identifies each node. Let’s call it “name”. Each node has a “name”. Some nodes (10% of the total) have a literal “targetName” that specifies “name” of another node to which do I need to create a “target” edge.
This is how I do it right now:
Bulk load the data.
Query for all nodes and return their "name"s and uids.
Externally create a uid to “name” hash
Query for all nodes (with a has(targetName) filter)
Create the N-quad for each mutation, one at a time (using the hash, “name” and “targetName”)
Live-load the generated RDF
Currently Step 2 here is killing my efficiency by orders of magnitude, and it is pretty redundant and suboptimal to go about. Is there a native dgraph functionality that can help me achieve this?
Given that I’m maintaining the uniqueness of “name” externally anyway, if I can use it as an xid identifier for live load mutations, that’ll be tremendously helpful but I’m not sure if XIDs serve that purpose.
Given this, if you could create a mutation for each x <friend> y as below. It will try to match by name, and if not found, create either/both “x” and “y” node. “john” and “steve” can be replaced by the unique identifiers you are using.
upsert {
# john friend steve
query {
findX(func: eq(name, "john")) {
x as uid
}
findY(func: eq(name, "steve")) {
y as uid
}
}
mutation {
set {
# set types
uid(x) <dgraph.type> "Person" .
uid(y) <dgraph.type> "Person" .
# set relation friend
uid(x) <friend> uid(y) .
# set attributes
uid(x) <name> "john" .
uid(y) <name> "steve" .
}
}
}
You could load data directly through curl commands as mentioned here. You could also use the Ratel UI, it’s definitely more user friendly.