Querying an arbitrary UID always returns a result

This is about a behaviour in DGraph that I find kind of unexpected.

When I run a query based on a UID, I always get a JSON result. For example, no node with UID 0x12345 exists and I run this query:

q(func: uid(0x12345)) {

returns something like this:

  "data": {
    "q": [
        "uid": "0x12345"
  "extensions": {
    "server_latency": {
      "parsing_ns": 81353,
      "processing_ns": 350765,
      "encoding_ns": 28336,
      "assign_timestamp_ns": 492830,
      "total_ns": 1017589

This means that my query q always returns a result list with length greater than 0. Therefore, I cannot use this simple query to check whether a node exists at this UID.

A work around would be:

q(func: has(dgraph.type)) @filter(uid, 0x12345) { 

Which would return a query q with length 0 and thus makes it easy to check whether something exists at the specified UID.

Is this behaviour intentional?

Dgraph metadata

dgraph version

Dgraph version : v21.03.0
Dgraph codename : rocket
Dgraph SHA-256 : b4e4c77011e2938e9da197395dbce91d0c6ebb83d383b190f5b70201836a773f
Commit SHA-1 : a77bbe8ae
Commit timestamp : 2021-04-07 21:36:38 +0530
Branch : HEAD
Go version : go1.16.2
jemalloc enabled : true

For Dgraph official documentation, visit https://dgraph.io/docs.
For discussions about Dgraph , visit https://discuss.dgraph.io.
For fully-managed Dgraph Cloud , visit Products – Dgraph | GraphQL Cloud Platform.

Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2021 Dgraph Labs, Inc.

1 Like

This is probably my favorite thing that when really understood goes a long way in understanding how Dgraph actually works.

Fun fact, a uid never exists and always exists at the same time in Dgraph

As you discovered above it always exists, but in the database it never exists. Proof of this concept is that if you do this mutation assuming you have no data on uid 0x2a (pick any other non-used uid for this):

set {
  uid("0x2a") <dgraph.type> "example" .
  uid("0x2a") <name> "Foo" .
  uid("0x2a") <description> "The answer is found at uid 42" .

And then you turn right around and do this mutation:

delete {
  uid("0x2a") <dgraph.type> * .
  uid("0x2a") <name> * .
  uid("0x2a") <description> * .

You delete the predicates, but you did not delete the uid itself, so you might reason to suspect that you need to drop all data or do something to delete the pointer subject of 0x2a But the truth is that if you were to export your data, you would find that 0x2a is nowhere to be found in your data.

So if you didn’t delete it, then where did it go? — It never existed by itself.

Dgraph stores data in triples. There is only ever one uid stored in the database by itself, maxuid, and that is just used as a reference of where to generate new uids starting from that stored uid and then update it.

So understanding this concept, how does a query using the uid actually work such as you exampled above? First, it gets the universe by your specific uids. And this get universe is deceiving, because if uids don’t really exist in the db besides as subjects in triplets, then what is it getting since it is not getting any predicates. And this is the explanation: The root uid() function does not actually perform any get operation. All it does it start with a list of uids to use when doing the following steps of filtering and predicate/edge selection.

So if you don’t do any filtering or edge selection, and only request the uid in response, then the uid will always be returned. Because no actual get operation was performed to the datastore.

While this workaround works,

it is not efficient. The reason being is that it is getting the universe of the dgraph.type and then filtering. To make it more efficient, swtich the filter and root function like:

q(func: uid(0x12345)) @filter(has(dgraph.type)) { 

Thanks for the deep dive and explanation. It is indeed very interesting and good to know more about the inner workings.

Going back to the original query and taking a user perspective. For which concrete use cases would this behaviour be useful or desired?

Some things are not built for actual use cases but are side effects of just how something works. This is the latter.

The question is if you want it to work different, what would it actually do and how would it be able to do that efficiently. You found the way to filter it out and that is the correct solution to the problem.

1 Like

Thanks! And Happy birthday!

1 Like