Bulk loader -x option

aamrtv · March 11, 2020, 5:08pm

I’m trying to load initial data into dgraph via the bulk loader and afterwards add new/modify existing nodes with the live loader. The problem is that if the live loader uploads nodes, which were uploaded by the bulk loader before, it creates duplicate nodes with new uids. I don’t want duplicate nodes. I need those new nodes either not to be loaded at all (if they bring no new edges for bulk loaded nodes) or to modify already existing nodes (if those new nodes do bring new edges for old nodes).

If I use only live loader, then it is trivial: I just add -x dirname to all my dgraph live commands and I get a xid directory named dirname, so I don’t get duplicate nodes with new uids from the following live loader data. The issue with bulk loader is that -x option does not create a folder for xids. Therefore, when I live load nodes with the uids, which were uploaded by the bulk loader before, I get duplicate nodes with new uids.

How do I prevent duplicate nodes when I use bulk load first and live load afterwards?

MichelDiz · March 11, 2020, 6:23pm

Thank you for reporting this I have created an issue for this and it is in the backlog now

github.com/dgraph-io/dgraph

Support the --xidmap option in Bulkload*

opened 06:22PM - 11 Mar 20 UTC

closed 04:38AM - 08 Apr 20 UTC

MichelDiz

kind/bug status/accepted area/bulk-loader

## Update Previously it was believed that the --store_xids flag had the same be…have as `--xidmap`. However, it is not true. But the `--xidmap` flag is needed in Bulkloader. ### What version of Dgraph are you using? ``` Dgraph version : v2.0.0-beta1 Dgraph SHA-256 : 178663a98a3d59879a3d5c42928c89eb5f83afc2bfc0093272941e7a53515847 Commit SHA-1 : 6fac5d7c4 Commit timestamp : 2020-01-30 14:45:54 +1100 Branch : HEAD Go version : go1.13.7 ``` ### Have you tried reproducing the issue with the latest release? Yes. ### Steps to reproduce the issue (command/config used to run Dgraph). Update: Ignore these steps, go to my last comment below. ``` dgraph bulk --store_xids -f ./agrovoc_2019-11-04_lod.nt --format=rdf -s schema.sch ``` ``` dgraph bulk -x -f ./agrovoc_2019-11-04_lod.nt --format=rdf -s schema.sch ``` ### Expected behavior and actual result. It should create an XID folder to be used in later imports via Live loader. Origin of this issue: https://discuss.dgraph.io/t/bulk-loader-x-option/6115

dmai · March 11, 2020, 7:31pm

The --store_xids flag for bulk loader writes xid edges into your database. This is different from the --xidmap flag for live loader, which writes out the xid-uid mapping to a separate directory.

aamrtv · March 12, 2020, 5:43am

Okay. How do I use the fact that xid edges are stored in the database to avoid duplicate nodes appearing after the bulk load, which is followed by >= 1 live loads? The docs don’t give much insight into that.

What is the purpose of writing xid directly to the database?

aamrtv · March 17, 2020, 3:44pm

I am sorry, still interested in the answer, so I reply to bring the topic to the top. Feel free to
How I can use xid edges stored in the dgraph after bulk load to avoid duplicates with the future dgraph live loads?

MichelDiz · March 27, 2020, 4:06am

I have found the answer, take a look at the issue I had opened. The last comment.

Cheers.

Anurag · April 9, 2020, 11:00am

@aamrtv thanks for raising this. It has been resolved in the current master. You can use --xidmap dirname flag while doing a bulk upload to save xids in a directory named dirname.

system · May 9, 2020, 11:00am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Bulk loader Dgraph	2	353	February 13, 2023
Improve Loaders: Add feature to continue a previous load Dgraph dgraph , status:accepted , kind:feature , area:usability , area:bulk-loader	1	552	April 11, 2019
Duplicate Nodes while using live loader Dgraph dgraph	1	393	November 12, 2020
Bulk loader same blank nodes from different rdf files Users	4	614	July 21, 2020
Where is the mapping of xids to uids which is created by bulk Users	3	660	April 5, 2018

Bulk loader -x option

Related topics