I am trying to create a user’s behavior graph and I have lots of rdf files. To make it successful, I have already created blank node for each user. like <_:user123> <ctr_id> "user123" .
The problem is, <_:user123>
may occur in other files(because I stored data by date):
given <_:user123> <ctr_id> "user123" .
in file 1,
<_:user123> <gender> "male" .
in file 2,
<_:user123> <click> <_:banner 1> .
in file 3…
So, I wanna ask how can I make sure graph can merge those same blank nodes as one.
I know in graph live --xidmap can handle this easily, but in graph bulk --store_xids, it seems not working. Graph create two separate uids for same blank node. As the files are quite big, over 200GB, I cannot just use graph live, so I wanna ask is there any way I can handle this?
Hi @jokk33 Welcome to Dgraph!
Which version are you using? There was a PR which removed -x
shorthand in Bulk loader. Also this PR adds support for xidmap in bulk loader.
store_xids
and xidmap
are different flags for different purposes. See this discussion: Support the --xidmap option in Bulkload* · Issue #4917 · dgraph-io/dgraph · GitHub
Hey @jokk33, as @Anurag said --store-xids
and --xidmap
are two different flags. --store-xids
creates edge xid
(name of edge) from from newly inserted node to blank id passed in RDF. It doesn’t check if any node with same blank already exist.
You are correct --xidmap
is the correct way to handle it and it has recently been added by @Anurag in bulkloader as well.
Were you running bulk loader with all files? It thats the case, it should have assign same uid to same blank node id. If not, please try running bulkloader passing all the files to it. I should work as expected.
Thanks a lot, that’s really helpful! I use docker dgraph v20.03.3.
I already saw that issue and I will try --xidmap in dgraph bulk soon.
You are doing a great job and I will continue to use dgraph, since it is so kind for big data.
(after all, Janusgraph is heavy and neo4j HA cluster charge a lot)
Thanks a lot, I will work on it!