Bulk loader same blank nodes from different rdf files

I am trying to create a user’s behavior graph and I have lots of rdf files. To make it successful, I have already created blank node for each user. like <_:user123> <ctr_id> "user123" .
The problem is, <_:user123> may occur in other files(because I stored data by date):
given <_:user123> <ctr_id> "user123" . in file 1,
<_:user123> <gender> "male" . in file 2,
<_:user123> <click> <_:banner 1> . in file 3…
So, I wanna ask how can I make sure graph can merge those same blank nodes as one.
I know in graph live --xidmap can handle this easily, but in graph bulk --store_xids, it seems not working. Graph create two separate uids for same blank node. As the files are quite big, over 200GB, I cannot just use graph live, so I wanna ask is there any way I can handle this?

Hi @jokk33 Welcome to Dgraph!

Which version are you using? There was a PR which removed -x shorthand in Bulk loader. Also this PR adds support for xidmap in bulk loader.

store_xids and xidmap are different flags for different purposes. See this discussion: Support the --xidmap option in Bulkload* · Issue #4917 · dgraph-io/dgraph · GitHub

1 Like

Hey @jokk33, as @Anurag said --store-xids and --xidmap are two different flags. --store-xids creates edge xid(name of edge) from from newly inserted node to blank id passed in RDF. It doesn’t check if any node with same blank already exist.
You are correct --xidmap is the correct way to handle it and it has recently been added by @Anurag in bulkloader as well.

Were you running bulk loader with all files? It thats the case, it should have assign same uid to same blank node id. If not, please try running bulkloader passing all the files to it. I should work as expected.

3 Likes

Thanks a lot, that’s really helpful! I use docker dgraph v20.03.3.
I already saw that issue and I will try --xidmap in dgraph bulk soon.
You are doing a great job and I will continue to use dgraph, since it is so kind for big data. :grin:
(after all, Janusgraph is heavy and neo4j HA cluster charge a lot)

2 Likes

Thanks a lot, I will work on it! :wink:

1 Like