Inconsistent response of sort as a result of indexing of a predicate

Consider this data:

    _:a <name> "alice" .
    _:a <class> "5" .
    _:b <name> "bob" .
    _:b <class> "2" .
    _:c <name> "cat" .
    _:c <age> "12" .
    _:d <name> "deg" .
    _:d <age> "13" .

and this query:

data(func: has(age)){
    t as uid
}
task(func: uid(t), orderasc: class){
    uid 
    name
}

Clearly, all the nodes which have age doesn’t have the class predicate. So all uid(t) nodes doesn’t have class predicate. What should be the expected from orderasc in this case when the sort predicate (class) is not present in any of the nodes?

Observation:

If we have <class>: string . in schema, i.e. the class predicate is not indexed, then we get this response:

"task": [
    {
      "uid": "0x96",
      "name": "cat"
    },
    {
      "uid": "0x97",
      "name": "dog"
    }
  ]

while, if we have <class>: string @index(exact) . in schema. That is, the class predicate is indexed. The the response is:

  "task": []

What should be the ideal behaviour of sorting in these cases?

You will have problems if you do not have total order. But to have total order it means you will have to exclude some entries for comparison.

Using your example, here’s what I mean. If we want partial ordering, then we don’t have to say that every possible pair will form a meaningful order.So, we will have (where ~ denotes a relation - think of it as Less() function)

a ~ a
a ~ b
a ~ c
a ~ d
...
d ~ a
d ~ b
d ~ c
d ~ d

Clearly in this set, some pairs cannot form meaningful relationships:

b ~ c
c ~ b
c ~ d
...

This is because these nodes do not have the correct fields to compare data on. But since we’re only after a partial order, you might say, that’s OK, all the nodes that have fields that cannot be compared go to the back of the list, in random order.

Here, we might see a problem:In order to fulfil a partial order, we DO have to have the condition that every item is comparable to itself (i.e. it is reflexive).Now imagine another node in the dataset:

_:e <name> "deg".
_:e <age> "13".

What is the status of these:

d ~ e
e ~ d

Here we see we have some problems with reflexivity. d and e are not the same nodes, but somehow they are equal to one another.

Figuring this shit out takes plenty of brain cells (well, kinda, topsorts actually solve the problem, but are hairy for a whole other reason). So let’s opt for something simpler - total order.

With total order, we say that every pair in a set will have a meaningful relationship. If no meaningful relationship can be formed, they are excluded from the set of consideration.

In this case the behaviour that is idea is when you orderasc:class , all nodes without a class predicate gets filtered out before hand.

TL;DR: cheap easy solution - filter before sort. Expensive full-featured solution: topsort.

I am currently digging the code to find out what exactly is being done by dgraph. I guess that Total order is considered by dgraph but on a filtered set because for this query on the same data:

{
  data(func: has(name), orderasc: class){
    name
    class
  }
}

We get a response:

 {
        "name": "bob",
        "class": "2"
      },
      {
        "name": "alice",
        "class": "5"
      },
      {
        "name": "cat"
      },
      {
        "name": "deg"
      }
    ]

But somehow, the filtered out elements are not put back in case of indexed class predicate:

and the response is:

"data": [
      {
        "name": "bob",
        "class": "2"
      },
      {
        "name": "alice",
        "class": "5"
      }
    ]

This response is not correct by total ordering. "cat" and "deg" should not be in the result set.

This response is correct by total ordering.

Yes, but I believe that Total Order is used for sorting (i.e. sorting is applied on a list of elements in which every pair has a meaningful relationship). And the filtered out elements are then appended on the result of this sort.

Does it make sense to throw elements out of the final result if they do not have the predicate which is used for sorting? I am in favor of having them in random order.

It makes mathematical sense. But in a real world scenario, sure why not? Then we can have this sort of things for a random number generator:

_:a <name> "foo".
_:a <rnd> NaN. 
_:b <name> "bar".
_:b <rnd> NaN.
_:c <name> "baz".
_:c <rnd> NaN.

or this type of query for random sorts:

q(func: has(name), orderasc: nonexistantPredicate, first:1) {
...
}

How exciting.

EDIT: tried the latter query (added a “nonsense” predicate first). Got empty results when indexed, but a static, non-random result when index is removed. Hmm. This seems like a bug

Yep exactly. It would have been exciting if it was completely random but it is static in some order (may be in the order of their insertion).

This is a bug. We should be consistent with the result. But again, the question is whether we want an empty result or a static (in some random/non-random order) response.

I vote empty result. I am quite partial to total orderings. They make life simple and things are easier to reason about. Also I spent way too many hours of my life trying to figure out what group actions apply to float64, so there’s some PTSD from that lol.

1 Like

I think we have had this discussion in this post: Expected behaviour for sort queries - #3 by Anurag A bunch of other SQL dbs retain the elements.