Hi! In order to avoid querying for uids to mutate nodes, I was interested in using a 64 bit hash to assign uids based off an unique external identifier, and had a series of questions regarding the viability of the endeavour. Is it possible to assign all uids in the uint64 range? How could it best be done and would it cause any problems with how the database functioned? Are there any reserved uid ranges that should be avoided?
Only via endpoint
/assign?what=uids&num=100 see => https://dgraph.io/docs/deploy/dgraph-zero/#endpoints
Nope. I think only the uid
0x0 should be avoided.
So I started testing mass allocating uids and it seemed to mostly work, though I did run into a problem with the
maxLeaseId. Although the /assign endpoint does let you allocate 2^64 - 1 uids, this seemingly triggers an overflow in the maxLeaseId, which apparently gets set to (2^64 - 1) + 10000. As a result, the highest realistically allocatable uid seems to be (2^64 - 1) - 10000, which would then leave the
maxLeaseId at 2^64 - 1. However this still results in the remaining 10000 uids as potentially reusable by the controller. Any suggestions as to how to get around this?
Personally I have no idea. I never did this before. Maybe @dmai could help.
Hm… there’s an enhancement we can do here to prevent the overflow the leased UIDs goes over uint64.
Are you concerned about collisions with a 64-bit hashing scheme? The automatic UID assignment by Dgraph guarantees that each node you add has a unique UID.
Collisions are a pretty big concern we have, making hashing a fairly flawed solution to our problem, perhaps to the point of not really being viable. For our use case, we will have a dataset of around 600m-1.5b nodes and we need an efficient way to perform weekly batch updates. In order to make the process relatively performant, we’re experimenting with different methods to avoid unnecessary queries to handle the xid to uid mapping for updates.
If you have a separate store of the xid/uids, then you can use that to send mutations directly for the appropriate nodes in Dgraph. The
/assign endpoint lets you manage your own range of UIDs to do that. Other users have also asked about managing their own UIDs which is where
Is the live loader xidmap well-suited for handling high volumes of uids, or do you think it would be a better idea to go down the route of setting up a separate store for the uid/xid map?