Loading Edges into Dgraph database

What I want to do

For the sake of an example I have two very basic JSON files. One of the data types being Person which contains a name and a company name they work for, and the other being data type Company containing its name.

My schema is

type Person {
  name
  company_name
  works_for: [Company]
}
type Company {
  company_name
}

And the example JSON files are:

Person.json

[
  {
     "name":"Example name",
     "company_name":"Acme"
  }
]

Company.json

[
  {
     "company_name":"Acme"
  }
]

I can load these in with the Live Loader very easily and get the following output.

I now want to create a edge “works_for” where company_name is the same for person and company. How best to do this in a scaleable manner?

I’m using Bulk Loader to instantiate a graph, and just went through a similar exercise. In my example, I have Author and Book entities, with an “authored” edge. The .json for the authored edge looks like this:

[
  {
    "uid": "_:8d61c26e-a959-4383-bd6d-1cc922368688",
    "authored": {
      "uid": "_:0ad095ee-ef9a-4c10-af84-170da2d3c604"
    }
  }
]

In my scenario, I’m loading everything at once, so using the uid/blank node feature to associate the specific author to the specific book.

Some additional details here:

You can also use Upsert Block. (There is a JSON version, which I believe it works only via HTTP/cURL):

upsert {
  query {
    q(func: eq(company_name, "Acme")) {
      v as uid #Find the company
    }
  }

  mutation {
    set {
      _:NewUser  <name> "Example name" .
      _:NewUser  <dgraph.type> "Person" .
      _:NewUser  <works_for> uid(v) .
    }
  }
}

See https://dgraph.io/docs/mutations/upsert-block/#sidebar

You have to remove the “company_name” from your users to not confuse it.

Liveloader has some options to do automatic upserts. But you have to use a flag to record the XIDs and also keep tracking XIDs(XIDs are external identifiers, in the case of Dgraph, Blank nodes will be considered XIDs during the load).

Thanks for the help. Using you suggestion I now have python script that upserts the edges node by node, it’ll do for now but it isn’t really scalable. I guess that to load in bulk a lot of data with the performance like: Loading close to 1M edges/sec into Dgraph - Dgraph Blog it needs to be in one big rdf file with the edges predefined?

The size or the amount of files has no limit. It depends on the resources available. And no, the edges don’t need to be predefined, you can connect them later.