Dgraph live - way to prevent duplication of data?

artooro · January 12, 2018, 2:30am

My desire is to be able to write an RDF file that when inserted into Dgraph using dgraph live -r myfile.rdf.gz that no matter how many times I run it, it will not insert duplicates of any of the triples.

So far from what I’ve been reading and testing, it doesn’t seem like there is a way to do this. So my last ditch is to post here and see what you guys think or whether there are alternative suggestions.

The use case is our data is being collected asynchronously and sent to the graph via events. I’d like for each event to generate an RDF file that ends up being fed to dgraph live but from what I’m seeing I’ll actually have to write code to do a transaction and manual query for each predicate.

On a side note, also not sure what to query to find out which node is the “lead” node. Querying http://127.0.0.1:8080/state doesn’t return anything. It would be nice to have a list of HTTP endpoints in the docs, or maybe I missed it.

pawan · January 12, 2018, 2:41am

Hey @artooro

That’s right. There is no way to currently for Dgraph live to do this right now. You’d have to add an index to store your xid, query it and use it.

The /state endpoint is present on Zero. So considering that Zero is serving HTTP on port 6080 (assuming you started with offset -2000), you could goto http://127.0.0.1:6080/state.

llonchj · January 12, 2018, 2:42am

Arthur,

Consider using a xidmap directory when invoking live dgraph live --xidmap .....

artooro · January 12, 2018, 2:45am

Gotcha, I’ll build my own query+insert tool then.

So I have zero running on port 5080 (kubernetes HA setup) and if I run curl http://127.0.0.1:5080/state it returns ? not sure what the question mark means.

artooro · January 12, 2018, 2:48am

@llonchj thats an interesting idea. So I could store a map on the local file system and lookup uids and do a search/replace maybe…
Or because of concurrency might be better to just write my own script.

pawan · January 12, 2018, 2:49am

@artooro If you always plan to use the same client to upload the data, then this should work. We store the xid => uid mapping in the --xidmap directory locally on the client.

Update - There is a small issue (Xidmap in live loader is not completely pushed to badger · Issue #2006 · dgraph-io/dgraph · GitHub) because of which the above solution won’t work for now.

If you are using the Kubernetes HA setup, try http://127.0.0.1:6080/state.

llonchj · January 17, 2018, 2:12am

@artooro I think the xidmap folder keeps mapping of the xid and the uid so you can use IRI’s

dgraph live -xidmap my-xidmap folder …
“john” .

dgraph live -xidmap my-xidmap folder …
<:1> “mark” .
<:1> .

system · February 16, 2018, 2:12am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Duplicate Nodes while using live loader Dgraph dgraph	1	393	November 12, 2020
Duplicate nodes with Live Loader and upsertPredicate Dgraph	9	858	June 15, 2023
Live loader produces duplicates with upsertPredicate enabled Dgraph	3	585	March 18, 2022
Fast Data Loading - Deploy Documentation	1	744	October 2, 2020
How to merge nodes or avoid Duplicate nodes in Dgraph live loading? Dgraph	5	401	July 29, 2021

Dgraph live - way to prevent duplication of data?

Related topics