Questions about importing data

In the process of importing the data, the waiting time was too long, so I stopped the importing event manually, but part of the data had already been introduced, which affected the integrity of my original data. Could you please tell me how I should solve the similar problem

If your dataset already has the UID fixed this problem won’t happen again. Or you could use Upsert. But if you are doing the first ingestion of your dataset. It is recommended that you use the Bulkloader instead. It is faster.

If your dataset already has the UID fixed this problem won’t happen again.

Sorry, I don’t quite understand. I think I’ll have to use the Live import under certain conditions, but there’s a high probability that the import will be aborted because of something, like a network outage or a power outage.

Okay, I’m not sure what you didn’t understand. What is your level of Dgraph? are you a pro Dgraph user? new? So I can elaborate a didactic answer or just say what would be the way.

Dgraph doesn’t have any “start from this point” feature. There are “advanced” ways of doing it that you have to master.

For example XID

dgraph bulk -h | grep xid
      --store_xids                       Generate an xid edge for each node.
      --xidmap string                    Directory to store xid to uid mapping

It is a complex topic, but once you get it you can go really far on mastering Dgraph.

You can also use the XID technic in live loader after the bulk - you just need to maintain unique Blanks nodes over your whole env.

➜  ~ dgraph live -h | grep xid
  -U, --upsertPredicate string       run in upsertPredicate mode. the value would be used to store blank nodes as an xid
  -x, --xidmap string                Directory to store xid to uid mapping

Also, there is the new upsert predicate. Which is a new feature that creates a bunch of upsert blocks based on your data. It is a new way but also valid.

The main suggestion I gave to you before was to have UIDs on your data. But I don’t know if your data comes from Dgraph, or is from another DB or something. Deal with the UID yourself is a hard job. But it is a way.

This is the way.

1 Like

I am a new user, thank you very much for your patience to answer, I seem to have understood what you said

To be short and a bit “redundant”. Dgraph is UID-based. So, the solution to avoid duplicates is to work in sync with this design.