Expected behaviour for sort queries

Going through some of the previous questions asked on the forum, I have come across some issues when using sort queries.

Using the following data to help in reproducing.

  <1> <name> "A" .
  <1> <uuid> "a" .
  <1> <dgraph.type> "Owner" .

  <2> <name> "B" .
  <2> <uuid> "b" .
  <2> <dgraph.type> "Owner" .

  <3> <name> "C" .
  <3> <uuid> "c" .
  <3> <dgraph.type> "Owner" .


  <10> <name> "House" .
  <10> <uuid> "1" .
  <10> <dgraph.type> "Device" .
  
  <11> <uuid> "car" .
  <11> <dgraph.type> "Device" .

  <12> <name> "A phone" .
  <12> <uuid> "1" .
  <12> <dgraph.type> "Device" .

  <10> <owner> <2>    .
  <11> <owner> <1>    .
  1. Sort with one predicate or one variable.
  • Sorting with one predicate
getSorted(func: type(Device), orderasc: name  ){
        uuid
        uid
        name
        owner{
            uuid
            name
        }
    }

In this case if name predicate is missing in some nodes then the output of the query is dependent on whether the name predicate has been indexed or not.. If name is indexed in the schema then output will omit the nodes without name predicate otherwise it will append the nodes without the predicate at the end. The reason is that for indexed predicate the sorting happens via sortWithIndex which omits the nil values. For predicates without the index, sorting happens via sortWithoutIndex which purposefully adds nodes without the predicate.

  • Sorting with one variable (this is typically done to sort edges with nested node value.) The query bubble ups the value and stores in the variable which is used for sorting in the query block.
devices as var(func: type(Device)) {
        owner {
            ownerName as name
        }
        x as min(val(ownerName))
    }

withOwner(func: uid(devices), orderasc: val(x)){
        uuid
        owner{
            uuid
            name
        }
}

In this case, query will always omit the nodes that either do not have edges to the nodes which we want to bubble up or have edges to the nodes which do not have the predicate which we are trying to bubble up. See code here.

  1. Sort with two or more predicates
    Suppose we want to sort via uuid and name. In this case the code flows through multiSort which adds nodes with missing predicates at the end irrespective of indexing.
getSorted(func: type(Device), orderasc: name, orderasc: uuid   ){
        uuid
        name
        owner{
            uuid
            name
        }
    }
  1. Sorting with 2 or more value variables or 1 value variable and 1 predicate etc.
    This is currently not supported, the query will throw at the parser.

  2. If no sorting is mentioned then lists are returned in sorted-by-uid order.

  3. When sorting strings: Z < a. i.e sorting is case-sensitive and follows ASCII order (Upper Case is less than Lower case)

To summarise:

  1. If you want nil to be included in your sorted results, do not index on that predicate.
  2. If you want nodes which point to nodes with nil values to be included, it is not supported right now, though the fix looks easy. See here.

Hope this helps!

Questions to @core-devs:

  1. Query output is dependent on whether the predicate is indexed or not. I feel it should be identical.
  2. Should we support appending nodes with nil predicates in case of sorting using variables? More generally should we append nodes with nil values (for the predicate on which we are sorting) in our sorted results?

Thanks

1 Like

I’d vote for nils to be removed when doing sorts. However, there might be a reason we kept them. Perhaps, see how SQL deals with nils when sorting, are those rows returned or not returned.

Actually I think none of the SQL based DBs seem to drop the data. Reference.

1 Like

Yes, I agree it should be identical. We supported it for sortWithoutIndex because that is the behaviour that SQL has. I’d say we should have the same behaviour for sortWithIndex as well though we’ll have to look into how easy/hard it is to do that.

I think we should do this as well. As a user, if I started with an initial set of nodes, I would expect to get back all those nodes even if some of them don’t have the predicate that I am sorting by.

2 Likes

Continuing the discussion from Identical query execution has inconsistent response:

This does not seem like the right behaviour! No other database does this that I know of. I am familiar with SQL and MySQL so from that realm, if I sort based on nullish field in a column it will put the nullish order on top. This can be controlled with using CASE in the ORDER BY clause to move the nullish values to the end of the order though.

Dgraph should have some kind of equivalent sorting of missing/nullish predicates. I understand that null predicates in Dgraph === missing predicates, which is not exactly the case in RDBs

Another Related Post:

Is this topic still being discussed? I recently stumbled upon this issue, too.

Is there any recommended workaround by Dgraph team (a.k.a. @core-devs) to resolve nodes with nil value being omitted? I’ve read the second point in summary of the first post (see below), but I’m confused how the equivalent query in DQL should be written (or whether DQL supports it in the first place).

1 Like

It’s being discussed, in multiple different threads (like this one Inconsistent response of sort as a result of indexing of a predicate)

This is the master thread.

1 Like

@vvbalaji @LGalatin I feel we should prioritize this and reach on a consensus.

I think we all agree that the result should not be dependent on whether the predicates are indexed or not.
On the second question, we can follow what the major SQL DBs do and not omit the nil values.

2 Likes

Agree, we should log these as two separate bugs in Dgraph and prioritize fixing them.

2 Likes