Improve Loaders: Add feature to continue a previous load

Moved from GitHub dgraph/3279

Posted by MichelDiz:

What you wanted to do

Continue a dataset load from where it stopped, with Live Load or Bulk Load which may have been interrupted by N reasons.

Why that wasn’t great, with examples

When an interrupt occurs. And I try to insert the load again, the load start from scratch. This is not desired result. Let’s avoid spending time rewriting something that is already in the DB.

MichelDiz commented :

IMPORTANT

This issue is not just about duplicate Nodes due to a load retry. You can avoid duplicated nodes by using the --xidmap flag.

e.g:

./dgraph live -f test.rdf,other.rdf.gz -s test.schema --xidmap ./xd

Every time you reuse the XIDMAP mapping files, all previously mapped blank_nodes will be automatically addressed/written to the mapped UID.

However the load will always start from scratch, even though Blank_nodes have already been mapped. This issue is just to create a “checkpoint” feature to avoid spending days rewriting something that is already in the DB.