Improve Loaders: Add feature to continue a previous load

diggy · April 10, 2019, 8:36pm

Moved from GitHub dgraph/3279

What you wanted to do

Continue a dataset load from where it stopped, with Live Load or Bulk Load which may have been interrupted by N reasons.

Why that wasn’t great, with examples

When an interrupt occurs. And I try to insert the load again, the load start from scratch. This is not desired result. Let’s avoid spending time rewriting something that is already in the DB.

diggy · April 11, 2019, 4:42pm

MichelDiz commented :

IMPORTANT

This issue is not just about duplicate Nodes due to a load retry. You can avoid duplicated nodes by using the --xidmap flag.

e.g:

./dgraph live -f test.rdf,other.rdf.gz -s test.schema --xidmap ./xd

Every time you reuse the XIDMAP mapping files, all previously mapped blank_nodes will be automatically addressed/written to the mapped UID.

However the load will always start from scratch, even though Blank_nodes have already been mapped. This issue is just to create a “checkpoint” feature to avoid spending days rewriting something that is already in the DB.

Topic		Replies	Views
Bulk loader -x option Users mutation	7	816	May 9, 2020
Duplicate Nodes while using live loader Dgraph dgraph	1	393	November 12, 2020
Bulk loader same blank nodes from different rdf files Users	4	614	July 21, 2020
Bulk loader Dgraph	2	353	February 13, 2023
Questions about importing data Dgraph kind:question , area:bulk-loader , area:live-loader	5	578	March 19, 2021

Improve Loaders: Add feature to continue a previous load

What you wanted to do

Why that wasn’t great, with examples

IMPORTANT

Related topics