For the sake of an example I have two very basic JSON files. One of the data types being Person which contains a name and a company name they work for, and the other being data type Company containing its name.
My schema is
type Person {
name
company_name
works_for: [Company]
}
type Company {
company_name
}
I’m using Bulk Loader to instantiate a graph, and just went through a similar exercise. In my example, I have Author and Book entities, with an “authored” edge. The .json for the authored edge looks like this:
You have to remove the “company_name” from your users to not confuse it.
Liveloader has some options to do automatic upserts. But you have to use a flag to record the XIDs and also keep tracking XIDs(XIDs are external identifiers, in the case of Dgraph, Blank nodes will be considered XIDs during the load).
Thanks for the help. Using you suggestion I now have python script that upserts the edges node by node, it’ll do for now but it isn’t really scalable. I guess that to load in bulk a lot of data with the performance like: Loading close to 1M edges/sec into Dgraph - Dgraph Blog it needs to be in one big rdf file with the edges predefined?
The size or the amount of files has no limit. It depends on the resources available. And no, the edges don’t need to be predefined, you can connect them later.