Dgraph live - way to prevent duplication of data?


(Arthur Wiebe) #1

My desire is to be able to write an RDF file that when inserted into Dgraph using dgraph live -r myfile.rdf.gz that no matter how many times I run it, it will not insert duplicates of any of the triples.

So far from what I’ve been reading and testing, it doesn’t seem like there is a way to do this. So my last ditch is to post here and see what you guys think or whether there are alternative suggestions.

The use case is our data is being collected asynchronously and sent to the graph via events. I’d like for each event to generate an RDF file that ends up being fed to dgraph live but from what I’m seeing I’ll actually have to write code to do a transaction and manual query for each predicate.

On a side note, also not sure what to query to find out which node is the “lead” node. Querying http://127.0.0.1:8080/state doesn’t return anything. It would be nice to have a list of HTTP endpoints in the docs, or maybe I missed it.


(Pawan Rawal) #2

Hey @artooro

That’s right. There is no way to currently for Dgraph live to do this right now. You’d have to add an index to store your xid, query it and use it.

The /state endpoint is present on Zero. So considering that Zero is serving HTTP on port 6080 (assuming you started with offset -2000), you could goto http://127.0.0.1:6080/state.


(Jordi Llonch) #3

Arthur,

Consider using a xidmap directory when invoking live dgraph live --xidmap .....


(Arthur Wiebe) #4

Gotcha, I’ll build my own query+insert tool then.

So I have zero running on port 5080 (kubernetes HA setup) and if I run curl http://127.0.0.1:5080/state it returns ? not sure what the question mark means.


(Arthur Wiebe) #5

@llonchj thats an interesting idea. So I could store a map on the local file system and lookup uids and do a search/replace maybe…
Or because of concurrency might be better to just write my own script.


(Pawan Rawal) #6

@artooro If you always plan to use the same client to upload the data, then this should work. We store the xid => uid mapping in the --xidmap directory locally on the client.


Update - There is a small issue (https://github.com/dgraph-io/dgraph/issues/2006) because of which the above solution won’t work for now.

If you are using the Kubernetes HA setup, try http://127.0.0.1:6080/state.


(Jordi Llonch) #7

@artooro I think the xidmap folder keeps mapping of the xid and the uid so you can use IRI’s

dgraph live -xidmap my-xidmap folder …
“john” .

dgraph live -xidmap my-xidmap folder …
<:1> “mark” .
<
:1> .


(system) #8

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.