Support for user provided identifiers

amitgupta1202 · March 8, 2018, 4:11am

correct me if I am wrong, as I understand user generated identifiers is not supported any more, instead its recommended use xid as an attribute to store it, this is great inconvenience for my use case and also inefficient. I am trying to evaluate dgraph for fraud detection system, and its highly connected graph and querying for uid before creating edges seems inefficient. Note that we have write over 3 billion edges, reading from db is the last thing on my mind. Eg, given email was used for an order, given phone was used for an order, in my use case email, phone and orderIds are ids, if 2 orders used same email, I would expect email will point to 2 different order but there will be only one email node. I was expecting to do only writes. Can you please advise? note that we have over 300million transactions with each order 8 attributes (like email, phone, address etc), can lead to 3-4 billion writes id we don’t generate id, if we do then we are looking twice the amount plus same amount of reads, also will lead to concurrency issues given dgraph will be generating uniqueIds for me.

Can you please advise ?

janardhan · March 8, 2018, 3:17pm

If you want a single email node for all users, then unfortunately you need to query dgraph to get the id. You can probably cache email to uid mapping on client side(If it cant’ fit in memory, use some eviction strategy and query dgraph only if not found in cache).

But Given disk space is cheap you can just store email as scalar value and index it. You can easily retrieve all the orders with that email via the index.

amitgupta1202 · March 8, 2018, 3:43pm

Yes, I understand that I don’t think the issue here is the storage, its
efficiency, now for new inserts of email, I will have to do a read to find
out the relevant id, now that creates 2 problems first it increases the
number of operations on the database which is not ideal for the amount of
data we have also it will create concurrency issues as a write will be
dependent on the read (not atomic), scenario, what if you insert the same
email as same time will end up with 2 emial with different ids, may be
transaction can solve the issue.

Not sure we will get kind of performance which we expected

amitgupta1202 · March 8, 2018, 4:57pm

Sorry I misunderstood your solution, its not straight forward in a
concurrent env but, definitely possible. I will give it a try

amitgupta1202 · March 9, 2018, 8:04am

Hi Janardhan,

The solution won’t work, we have multiple Kafka consumer that will be writing to database, we are in a distributed env, local cache lookup is not possible

Regards,
Amit Gupta

wxwc · March 19, 2018, 7:50pm

Hey,

Really guys, you can’t identify existing nodes in the database? And if I want to refer a node I have to provide a mapping myself? I’m fighing with the same problme as amitgupta1202 and it’s hard to belive that I can’t match the same node in two different transactions. Any comments?

pawan · March 19, 2018, 9:58pm

Yup, the only way right now is to give a uid. Now, either you query for the uid or keep a mapping on the client side.

We understand its an issue for a lot of our users and are investigating what can be done about it.

wxwc · March 19, 2018, 10:01pm

Can I use my UIDs or only returned by dgraph?

pawan · March 19, 2018, 10:02pm

You can only use those returned by Dgraph.

wxwc · March 19, 2018, 10:06pm

Wow, so when I start more then one client that tries to write nodes to the database I have a problem with concurrency, I have to admit that version 1.0 is quite useless. it can’t be scaled.

system · April 18, 2018, 10:06pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Auto-generated IDs VS Dgraph UIDs Dgraph	1	544	June 16, 2020
Natural id / business key Dgraph	2	328	August 4, 2021
Generate IDs on the client GraphQL kind:question	6	981	October 18, 2020
ID convention for nodes Dgraph dgraph , help-wanted	3	580	August 7, 2020
External ID Functionality Dgraph	2	502	May 9, 2020

Support for user provided identifiers

Related Topics