Cannot use entire UID space from clients

I Want to Do

Use all possible UIDs from 0 → maxUint64

What I Did

My existing dgraph model includes the use of an XID predicate that is an ID that is deterministic and unique in my system. Every insert to dgraph then has to be an upsert that looks up the node that has that XID and uses that variable throughout the mutation:

upsert {
  query {
    a as var(func: (xid,"myxidvalue"))
  }
  mutation {
    set {
      uid(a) "field" "value" .
    }
  }
}

However, what I want to do is make my use of the internal UID in dgraph detirministic. By taking the hash of my internalID and using the first 64bits of that hash, I can get a number that is of the correct size to use as UID - no problem so far. (other than possible hash collisions… but I want to see if the risk is worth the reward)

To pre-assign those UIDs I run

# curl '127.0.0.1:6080/assign?what=uids&num=18446744073709551615'
{"start":1,"end":18446744073709551615} #its actually alittle harder than this but lets pretend this works

Try to call that endpoint again and you get

# curl '127.0.0.1:6080/assign?what=uids&num=1'
{"errors":[{"message":"Server not initialized","extensions":{"code":"Error"}}]}

ok, bad error, but thats the max for a uint64, lets look past that.

Problem

At this point, I expect to be able to use any uid from 0->maxUint64 on the cluster. But, the alpha believes he only has a handful of UIDs assigned:

# curl --silent 127.0.0.1:8080/state|jq -r '.maxLeaseId'
154253

And the errors my clients get agree:

error encountered performing dgraph request: rpc error: code = Unknown desc = Uid: [2197689836641987139] cannot be greater than lease: [154253]

So, is there a way I can use dgraph and use any available UID? I am sure it would do wonders to my ingestion speed to not have to look up the UID from XID on every single operation.

Dgraph Metadata

dgraph version

[Decoder]: Using assembly version of decoder
Page Size: 4096

Dgraph version : v20.11.0
Dgraph codename : tchalla
Dgraph SHA-256 : 8acb886b24556691d7d74929817a4ac7d9db76bb8b77de00f44650931a16b6ac
Commit SHA-1 : c4245ad55
Commit timestamp : 2020-12-16 15:55:40 +0530
Branch : HEAD
Go version : go1.15.5
jemalloc enabled : true

1 Like

Would it make sense to expose a flag for --dont-lease-uids, in the same vein of “let me do dangerous things” ala --ludicrous? It could error on any muation that had a blank node, but otherwise work the same.

It seems for high-throughput ingestion the difference is an orders-of-magnitude improvement over the upersert-every-query flow.

Is there anything in dgraph that needs the leasing system if uids were always controlled externally?

1 Like

bump…?

This is an interesting one. I have no idea how to do that. @mrjn might give a better answer that explains why the alpha only sees a few uids (I got the same results).

Isn’t the zero responsible for managing ids or has that changed? I thought the alpha just receives a list of ids from the zero that it is allowed to use and then when it gets near the end of those it asks zero for some more.

If this is still the case, it logically makes sense that Alpha says you can use any of the uids up to the max created so far as references, or you can use any of these new uids that zero has allotted to us. I would think that allowing an alpha to use any uid it wanted would cause some bad things later on, though that may be acceptable to the user at this point

This would make sense for maybe an initial ingestion of data (not using bulk mode) and then disable it. This would remove the blank_node feature and require all uids to be manually assigned while in this mode, but maybe that is acceptable for an initial ingestion of data.

The alpha forwards the assign gRPC call to the zero. So yea it’s the zero controlling the UID assignment but you can make that call on the alpha.

In my system, I can make 100% of the nodes have a deterministic ID. So if every insert used a specific UID, what bad things would happen? Really every UID exists all the time anyway - since nothing is stored physically by UID, it’s all by predicate.

(from what I can tell) The whole idea of having a lease of UIDs given to the alpha to use is so that when a new node is inserted without a specified UID, it can quickly assign one and move on. If you can guarantee that no nodes will be inserted without a specified UID, maybe it would be OK to open up the use of any UID on insert?

2 Likes

For the record, I shortcutted this function in dgraph and then all UIDs were available to insert. This makes my application change from every insert needs to be an atomic insert - to, everything being a straight up insert to a specific UID and eventual consistency is cool so use ludicrous mode.

So, I think this use case is interesting and could be a huge performance increase for our application. At very least the bugs I listed above should be addressed. (allocating all uint64 uids returns Server not initialized, and allocating a large block or all UIDs makes the alpha unsure of how many UIDs have been leased)

Can you try with something less than maxuint64 ? Seems like you are running into int overflows.

Sure, yea using a number less than the max uint64 does not get the Server Uninitialized error… but the bug I am referring to is seeing that when the error should be cannot lease any more uids or something like that.

Seems like for attempting to assign all uint64, the only way I can do it is request a small amount of uids repeatedly until I reach something close to uint64, but if I ever trigger the Server Uninitialized message I wont be able to change it again and I have to reboot the cluster. Also if you use /assign and you ask for too many, seems like the uint wraps and changes the assigned amount down.

# having already allocated a ton of uids in several smaller calls
% curl --silent "127.0.0.1:6080/assign?what=uids&num=99999999999989990"
{"startId":"18346744073709561619","endId":"18446744073709551608","readOnly":"0"}
% curl --silent "127.0.0.1:6080/assign?what=uids&num=100"
{"startId":"18446744073709551609","endId":"92","readOnly":"0"}