Natural id / business key

wiradikusuma · August 4, 2021, 5:57pm

How efficient is retrieving Dgraph objects in DQL using natural ids e.g. @filter(eq(email, $email)) instead of func: uid($someId)?

The reason I ask is, instead of exposing Dgraph’s native uid (e.g. 0x123) to public, I would rather use, say, random string, so that people couldn’t guess how much data I have (for example).

Also, is there a native construct for that? E.g. in RDMBS we can name the key anything as long as we set the column as PRIMARY KEY.

iluminae · August 4, 2021, 6:46pm

Its very common to use externally tracked ids in dgraph. See here and here for a bit of information there, but basically every update will then be an upsert of the form:

upsert {
  query{
    me as var(func: eq(email,"me@them.org"))
  }
  mutation {
    set {
      uid(me) <myfield> "myvalue" .
      uid(me) <email> "me@them.org" .
    }
  }
}

… which will create a new node with that email and myfield if it does not exist, and if it does, it will apply myfield=myvalue to it. This is done atomically, so no need to worry about duplicates. You may also want to mark the field being used as an external id as @upsert in the schema to ensure uniqueness. See here for more on @upsert.

Note the external id (email in this case) is being inserted along with the mutation. You need to set this in the case of a new entry being made. Conditional mutations can gate this if you want to avoid writing the same value over and over, but that is purely an optimization.

Downside: slower, but how much? maybe you wont notice, depends on many things.
Upside: using integers as ids is awkward and this is much better.

MichelDiz · August 4, 2021, 9:27pm

I wouldn’t care about this. Isn’t that a big deal to guess it. Also, the size of data or number of nodes increases be it several small nodes or just a few nodes with several data. Make no sense to try to guess it. If you have billion of nodes related to comments from, let’s say, 300k users. The guesser would think that you have a Billion users?

I would care about exposing UIDs if they can have free usage of the API. So they would use the collected UIDs to explore the data from your cluster. Basically, besides the leasing is sequential, the UID usage isn’t sequential. So the attacker can’t exploit this as he would do with a common ID usage(sequential IDs instead of random).

PS. You can also lease a billion UIDs and not use them. So the guesser would be confused. Flood them with info is the best way to hide the real.

Topic		Replies	Views
Confused with the external id Dgraph kind:question	2	1041	March 25, 2021
Using external identifiers with DQL fails Dgraph Cloud kind:question	8	569	February 12, 2021
Dql upsert @id instead of ID! Dgraph kind:question , dgraph , untagged , dql , lambda	4	873	March 18, 2021
Can I specify a custom field as the unique field like uid，but it store UUID value Dgraph kind:question	6	581	May 15, 2021
Using an upsert block to increment an ID Users	8	742	March 13, 2020

Natural id / business key

Related topics