Questions about importing data

03B037 · March 18, 2021, 11:31am

In the process of importing the data, the waiting time was too long, so I stopped the importing event manually, but part of the data had already been introduced, which affected the integrity of my original data. Could you please tell me how I should solve the similar problem

MichelDiz · March 18, 2021, 4:01pm

If your dataset already has the UID fixed this problem won’t happen again. Or you could use Upsert. But if you are doing the first ingestion of your dataset. It is recommended that you use the Bulkloader instead. It is faster.

03B037 · March 19, 2021, 1:38am

If your dataset already has the UID fixed this problem won’t happen again.

Sorry, I don’t quite understand. I think I’ll have to use the Live import under certain conditions, but there’s a high probability that the import will be aborted because of something, like a network outage or a power outage.

MichelDiz · March 19, 2021, 2:08am

Okay, I’m not sure what you didn’t understand. What is your level of Dgraph? are you a pro Dgraph user? new? So I can elaborate a didactic answer or just say what would be the way.

Dgraph doesn’t have any “start from this point” feature. There are “advanced” ways of doing it that you have to master.

For example XID

dgraph bulk -h | grep xid
      --store_xids                       Generate an xid edge for each node.
      --xidmap string                    Directory to store xid to uid mapping

It is a complex topic, but once you get it you can go really far on mastering Dgraph.

You can also use the XID technic in live loader after the bulk - you just need to maintain unique Blanks nodes over your whole env.

➜  ~ dgraph live -h | grep xid
  -U, --upsertPredicate string       run in upsertPredicate mode. the value would be used to store blank nodes as an xid
  -x, --xidmap string                Directory to store xid to uid mapping

Also, there is the new upsert predicate. Which is a new feature that creates a bunch of upsert blocks based on your data. It is a new way but also valid.

The main suggestion I gave to you before was to have UIDs on your data. But I don’t know if your data comes from Dgraph, or is from another DB or something. Deal with the UID yourself is a hard job. But it is a way.

This is the way.

03B037 · March 19, 2021, 2:17am

I am a new user, thank you very much for your patience to answer, I seem to have understood what you said

MichelDiz · March 19, 2021, 2:21am

To be short and a bit “redundant”. Dgraph is UID-based. So, the solution to avoid duplicates is to work in sync with this design.

Topic		Replies	Views
Understanding bulk data loads, and bulk updates, with XID in v0.8 Users	2	850	November 1, 2017
Where is the mapping of xids to uids which is created by bulk Users	3	660	April 5, 2018
Bulk loader -x option Users mutation	7	815	May 9, 2020
Live loader produces duplicates with upsertPredicate enabled Dgraph	3	583	March 18, 2022
Live Loader came up with a lot of aborts Dgraph faq	13	1293	June 24, 2020

Questions about importing data

Related topics