Offset appears not to work with multiple order statements


(Michael Beeson) #1

Using this query:

bills(func: has(billsType), orderdesc: date, offset: 5, first: 5) {
  expand(_all_)
}

Everything works as expected (I get result 5 to 10). But when I have:

bills(func: has(billsType), orderdesc: date, orderdesc: time, offset: 5, first: 5) {
   expand(_all_)
}

I get the same results whether offset is set to 0 or to 5. Have I made a mistake with the syntax for orderdesc? Do I need to group the order statements somehow for offset to take both into account?


(Michel Conrado) #2

One rule for sorting is to have any value indexed. If not so, Dgraph will ignore them. Add any dummy date to the other nodes.


(Michael Beeson) #3

Hey Michel, thanks for replying!

Both ‘date’ and ‘time’ values are indexed. Offset works fine if I use only one or the other for ordering, but not when I use both. It really seems to be the combination of using both that causes the issue.

What’s more, the ordering is fine; the bills are ordered first by date, and then by time (as expected). It is ‘offset’ that isn’t working. If I use offset: 0, first: 10 I do indeed get the first ten values. But if I change the value of offset nothing happens. That is unless I go over a certain value, which happens to correspond to the number of bills that have the same date.

This suggests to me that, though the results are ordered correctly, offset only discards the first N items of the results ordered only by the first orderdesc statement (the ‘date’ statement).


(Michel Conrado) #4

The idea of ​​pagination is simple. If you have 10 nodes and use “offset: 0, first: 5” you will have two pages. Because the pagination will divide 10 nodes by 5. If you use “offset: 5*, first: 5” you will be on the second page. And if you use “offset: 10*, first: 5” will not return anything, for there are no more pages.

Remembering that the paging system is zero based. So the first page is offset: 0

Maybe I did not understand your issue, if my answer is not satisfactory please provide examples and steps to reproduce.

*I’ve fixed some concepts in this response.


(Michael Beeson) #5

Hey there Michel!

I think you haven’t understood. First of all, what you say is inaccurate. Using “offset: 1, first: 5” will show me results 2-6 as opposed to 1-5. To display the second page I would use “offset: 5, first: 5”.

The problem is that offset doesn’t work when using multiple order statements. To demonstrate this I’ll give you a test mutation:

{
	"set": [
    {
      "date": "2019-04-30",
      "time": "1970-01-01T09:45:12",
      "num": 1
		},
    {
      "date": "2019-04-30",
      "time": "1970-01-01T12:02:39",
      "num": 2
    },
    {
      "date": "2019-04-30",
      "time": "1970-01-01T14:20:06",
      "num": 3
    },
    {
      "date": "2019-04-30",
      "time": "1970-01-01T19:29:57",
      "num": 4
    },
    {
      "date": "2019-05-01",
      "time": "1970-01-01T09:05:50",
      "num": 5
    },
    {
      "date": "2019-05-01",
      "time": "1970-01-01T11:24:51",
      "num": 6
    },
    {
      "date": "2019-05-01",
      "time": "1970-01-01T14:25:03",
      "num": 7
    },
    {
      "date": "2019-05-01",
      "time": "1970-01-01T16:18:40",
      "num": 8
    },
    {
      "date": "2019-05-01",
      "time": "1970-01-01T19:41:37",
      "num": 9
    },
    {
      "date": "2019-05-01",
      "time": "1970-01-01T21:30:16",
      "num": 10
    },
    {
      "date": "2019-05-01",
      "time": "1970-01-01T22:15:49",
      "num": 11
    }
  ]
}

Don’t forget to set the index for “date” and “time” predicates to “datetime”.

So first of all let’s just order by “num” and see that offset is working fine:

query {
  fun(func: has(date), orderdesc: num, offset: 0, first: 5) {
    num
  }
}

Provides the result:

"data": {
    "fun": [
      {
        "num": 11
      },
      {
        "num": 10
      },
      {
        "num": 9
      },
      {
        "num": 8
      },
      {
        "num": 7
      }
    ]
  }

Now let’s use “offset: 2, first: 5” which I in theory should give us, in descending order, nums 9-5, which indeed it does:

query {
  fun(func: has(date), orderdesc: num, offset: 2, first: 5) {
    num
  }
}
"data": {
    "fun": [
      {
        "num": 9
      },
      {
        "num": 8
      },
      {
        "num": 7
      },
      {
        "num": 6
      },
      {
        "num": 5
      }
    ]
  }

So far everything is looking great.

So now let’s start ordering by date and time. I have them in separate predicates for reasons of searching. Anyway, if I just use “orderdesc: date” we get chaos!

query {
  fun(func: has(date), orderdesc: date) {
    num
  }
}
"data": {
    "fun": [
      {
        "num": 10
      },
      {
        "num": 9
      },
      {
        "num": 11
      },
      {
        "num": 5
      },
      {
        "num": 6
      },
      {
        "num": 7
      },
      {
        "num": 8
      },
      {
        "num": 2
      },
      {
        "num": 3
      },
      {
        "num": 4
      },
      {
        "num": 1
      }
    ]
  }

But this of course makes sense - a lot of the nodes have the same date. We need to sort by time as well:

query {
  fun(func: has(date), orderdesc: date, orderdesc: time) {
    num
  }
}
"data": {
    "fun": [
      {
        "num": 11
      },
      {
        "num": 10
      },
      {
        "num": 9
      },
      {
        "num": 8
      },
      {
        "num": 7
      },
      {
        "num": 6
      },
      {
        "num": 5
      },
      {
        "num": 4
      },
      {
        "num": 3
      },
      {
        "num": 2
      },
      {
        "num": 1
      }
    ]
  }

Now that’s what we expect. But now when I use offset, curiously enough it’s fine when I have “offset: 0, first: 5” because we get the expected result:

query {
  fun(func: has(date), orderdesc: date, orderdesc: time, offset: 0, first: 5) {
    num
  }
}
"data": {
    "fun": [
      {
        "num": 11
      },
      {
        "num": 10
      },
      {
        "num": 9
      },
      {
        "num": 8
      },
      {
        "num": 7
      }
    ]
  }

But look what happens when I change offset:

query {
  fun(func: has(date), orderdesc: date, orderdesc: time, offset: 2, first: 5) {
    num
  }
}
"data": {
    "fun": [
      {
        "num": 11
      },
      {
        "num": 8
      },
      {
        "num": 7
      },
      {
        "num": 6
      },
      {
        "num": 5
      }
    ]
  }

Or if we try another value for offset:

query {
  fun(func: has(date), orderdesc: date, orderdesc: time, offset: 6, first: 5) {
    num
  }
}
"data": {
    "fun": [
      {
        "num": 8
      },
      {
        "num": 4
      },
      {
        "num": 3
      },
      {
        "num": 2
      },
      {
        "num": 1
      }
    ]
  }

It appears to be random! But if you play around with it, things become clear.

Remember earlier when I ordered only by date? Well the “offset” value that I’m using is discarding the first N results of that result, but then everything afterwards, including the value of “first” and the “orderdesc: time” is working as expected.

When I used “offset: 2, first:5” with “orderdesc: date, orderdesc: time” we jump straight from 11 to 8 because 9 and 10 have been discarded by offset. How do I know 9 and 10 are discarded? Well when I do “orderdesc:date” alone those are the first two values returned.

Now because of the nature of dgraph the specifics you get if you run this may be different, but the behaviour is consistent. “Offset” appears not to take into account order calls subsequent to the first.

Either that or I’m doing something completely wrong. But then why do the two orderdesc statements work correctly in every other aspect apart from “offset”?

Sorry if that explanation was very longwinded! I just though an exhaustive example would make the problem as clear as possible.


(Michel Conrado) #6

You’re right. I confused myself. In fact we have to do the division and use the multiples manually.
If you divide a chain of nodes by 5 (first: 5 of 25 nodes), you have to use 0, 5, 10, 15, 20 to page - so you have 5 pages. If you use a chain of nodes divided by first: 2 the pages will be 0, 2, 4, 6, 8, 10 so on.

{
  q(func: has(position),   orderasc: pos,  first: 2,   offset: 2){
    uid
    position
    pos
  }
}
{
  "set": [
      {
        "position": "First",
        "pos": "1"
      },
      {
        "position": "second",
        "pos": "2"
      },
      {
        "position": "third",
        "pos": "3"
      },
      {
        "position": "fourth",
        "pos": "4"
      },
      {
        "position": "fifth",
        "pos": "5"
      },
      {
        "position": "sixth",
        "pos": "6"
      },
      {
        "position": "seventh",
        "pos": "7"
      },
      {
        "position": "eighth",
        "pos": "8"
      },
      {
        "position": "ninth",
        "pos": "9"
      },
      {
        "position": "tenth",
        "pos": "10"
      }
    ]
}

Now back to your case, I’ll make some tests. Need to check if the issues would be related to datetime only. I guess is the case. Cuz I see the “num”: 9, “num”: 10 and “num”: 11 with perfect date and time alignment. It feels like a bug.

In fact detailed explanations are the best :stuck_out_tongue:


#7

Hi Michel,

I don’t thing that it is related only with datetime. I have similar problem with string predicates.


(Michael Beeson) #8

Selmeci is right. I’ve just run the same test using strings rather than datetime. You can try yourself, using this data:

{
	"set": [
    {
      "string1": "A",
      "string2": "A",
      "num": 1
		},
    {
      "string1": "A",
      "string2": "B",
      "num": 2
    },
    {
      "string1": "A",
      "string2": "C",
      "num": 3
    },
    {
      "string1": "A",
      "string2": "D",
      "num": 4
    },
    {
      "string1": "B",
      "string2": "A",
      "num": 5
    },
    {
      "string1": "B",
      "string2": "B",
      "num": 6
    },
    {
      "string1": "B",
      "string2": "C",
      "num": 7
    },
    {
      "string1": "B",
      "string2": "D",
      "num": 8
    },
    {
      "string1": "B",
      "string2": "E",
      "num": 9
    },
    {
      "string1": "B",
      "string2": "F",
      "num": 10
    },
    {
      "string1": "B",
      "string2": "G",
      "num": 11
    }
  ]
}

Now these two queries will both provide the results in the same order:

query {
  fun(func: has(num), orderdesc: num) {
    expand(_all_)
  }
}
query {
  fun(func: has(num), orderdesc: string1, orderdesc: string2) {
    expand(_all_)
  }
}

But if I use offset on the second query, I get the same incorrect results as I laid out in my earlier post. And once again, the errors correspond to “offset” discarding the first N items of only the first order command, and not the combined order commands.

I’m just surprised this hasn’t come up earlier. I can’t be the first person to try to paginate data with multiple order statements?


(Michel Conrado) #9

I guess if the strings have the right order will be okay tho. Your example confuses Dgraph’s sorting.

Page 1

{
  fun(func: has(num), orderasc: string1, orderasc: string2, 
  first: 5, offset: 0) {
    expand(_all_)
  }
}

Result

{
  "data": {
    "fun": [
      {
        "num": 1,
        "string1": "L",
        "string2": "A",
        "uid": "0x7"
      },
      {
        "num": 2,
        "string1": "M",
        "string2": "B",
        "uid": "0x8"
      },
      {
        "num": 3,
        "string1": "N",
        "string2": "C",
        "uid": "0x1"
      },
      {
        "num": 4,
        "string1": "O",
        "string2": "D",
        "uid": "0x2"
      },
      {
        "num": 5,
        "string1": "P",
        "string2": "E",
        "uid": "0x3"
      }
    ]
  }
}

Page 2

{
  fun(func: has(num), orderasc: string1, orderasc: string2, 
  first: 5, offset: 5) {
    expand(_all_)
  }
}

Result

{
  "data": {
    "fun": [
      {
        "num": 6,
        "string1": "Q",
        "string2": "F"
      },
      {
        "num": 7,
        "string1": "R",
        "string2": "G"
      },
      {
        "num": 8,
        "string1": "S",
        "string2": "H"
      },
      {
        "num": 9,
        "string1": "T",
        "string2": "I"
      },
      {
        "num": 10,
        "string1": "U",
        "string2": "J"
      }
    ]
  }
}

(Michael Beeson) #10

The data you’ve used makes the second order command redundant. The second order command only comes into play when the first order command returns an equal result. You would need several entries that have equal values for “string1” for the second order command to be used.

If you use the data example I provided, you can see this working as intended. Everything working as intended, apart from “offset”.


(Michel Conrado) #11

I made like this to show how it behaves. I did a continuous ordering via two different predicates. That way I’m pretty sure Dgraph is applying the two parameters correctly.

By my tests everything is normal. Dgraph first resolves the order of “string1” and then “string2”.

BTW, you have to put in mind that this sorting are basically string (alphabet) ordering. The numbers (added to represent position) would not always correspond to the alphabet ordering when you have 2 or more referentials. And It often has repeated values in some predicates and some others do not. This confuses the ordering. It will work by “weight” and will depends on the size of the “page”.

The biggest issue here is to harmonize multiple (concurrent) ordering sources. So the result is computed in a way that takes in consideration both/all sources of order.

Take this example:

In the example below I added “num” as sorting main weight. That way in the pagination of larger numbers (multiples) you will have “num” as the pivot of the ordering. If you decrease the size of the pagination (from 5 to 3 or 2 in “first” param) you will have a bigger weight dispute between the three parameters. So it gets odd (But it is expected, since the predicates have different or repeated characters).

using these parameters below you will have the correct order. Precisely because “num” was added as an important weight for sorting .
page 1 = “offset: 0, first: 5”
page 2 = “offset: 5, first: 5”
page 3 = “offset: 10, first: 5”

{
  fun(func: has(num), 
  orderdesc: num, 
  orderdesc: string1, 
  orderdesc: string2,
  offset: 5, first: 5
) {
    expand(_all_)
  }
}

If you still think we have a Bug or something. Please feel free to open a issue so a Core Dev can investigate deeply.


(Michael Beeson) #12

Ok. I’m 95% sure this is a bug, and I guess I should open an issue with the Core Dev, but I still consider myself a dgraph noob and I want to convince you first, Michel, because you’ve been so helpful all along.

Maybe this makes more sense with real data that makes sense, so how about this? We have users where their first name and last name are stored separately. Imagine we have two users, one called “Michel Beeson” and one called “Michel Conrad”. Now in dgraph, if we were to order only by first name, there’s no guarantee that Michel Beeson should appear before Michel Conrad. We would have to say “first, order by first name, and if the first name is the same, then order by last name”. So here’s a mutation that creates a bunch of users, who are either called “Michel” or “Susan”, but they have different surnames.

I will also add a predicate called “naturalOrder” which corresponds to the order that you would expect (ascending).

{
	"set": [
    {
      "firstName": "Michel",
      "lastName": "Antwerp",
      "naturalOrder": 1
		},
     {
      "firstName": "Michel",
      "lastName": "Beeson",
      "naturalOrder": 2
		},
     {
      "firstName": "Michel",
      "lastName": "Conrad",
      "naturalOrder": 3
		},
     {
      "firstName": "Michel",
      "lastName": "Fittspatrick",
      "naturalOrder": 4
		},
     {
      "firstName": "Michel",
      "lastName": "Wallace",
      "naturalOrder": 5
		},
     {
      "firstName": "Michel",
      "lastName": "Zabala",
      "naturalOrder": 6
		},
    {
      "firstName": "Susan",
      "lastName": "Button",
      "naturalOrder": 7
		},
    {
      "firstName": "Susan",
      "lastName": "Dillan",
      "naturalOrder": 8
		},
    {
      "firstName": "Susan",
      "lastName": "Grande",
      "naturalOrder": 9
		},
    {
      "firstName": "Susan",
      "lastName": "Jackson",
      "naturalOrder": 10
		},
    {
      "firstName": "Susan",
      "lastName": "Lawrence",
      "naturalOrder": 11
		},
    {
      "firstName": "Susan",
      "lastName": "Palmer",
      "naturalOrder": 12
		},
    {
      "firstName": "Susan",
      "lastName": "Ralley",
      "naturalOrder": 13
		},
    {
      "firstName": "Susan",
      "lastName": "Trenton",
      "naturalOrder": 14
		}
    ]
}

Now I’m going to do some simple queries without pagination, just to be extra annoying and exhaustive. If I just do:

query {
  users(func: has(firstName)) {
    expand(_all_)
  }
}

I get:

"data": {
    "users": [
      {
        "lastName": "Beeson",
        "naturalOrder": 2,
        "firstName": "Michel"
      },
      {
        "lastName": "Wallace",
        "naturalOrder": 5,
        "firstName": "Michel"
      },
      {
        "lastName": "Button",
        "naturalOrder": 7,
        "firstName": "Susan"
      },
      {
        "lastName": "Dillan",
        "naturalOrder": 8,
        "firstName": "Susan"
      },
      {
        "lastName": "Conrad",
        "naturalOrder": 3,
        "firstName": "Michel"
      },
      {
        "lastName": "Jackson",
        "naturalOrder": 10,
        "firstName": "Susan"
      },
      {
        "lastName": "Trenton",
        "naturalOrder": 14,
        "firstName": "Susan"
      },
      {
        "lastName": "Zabala",
        "naturalOrder": 6,
        "firstName": "Michel"
      },
      {
        "lastName": "Grande",
        "naturalOrder": 9,
        "firstName": "Susan"
      },
      {
        "lastName": "Lawrence",
        "naturalOrder": 11,
        "firstName": "Susan"
      },
      {
        "lastName": "Antwerp",
        "naturalOrder": 1,
        "firstName": "Michel"
      },
      {
        "lastName": "Fittspatrick",
        "naturalOrder": 4,
        "firstName": "Michel"
      },
      {
        "lastName": "Palmer",
        "naturalOrder": 12,
        "firstName": "Susan"
      },
      {
        "lastName": "Ralley",
        "naturalOrder": 13,
        "firstName": "Susan"
      }
    ]
  }

Totally random and as it should be. So lets sort by firstName descending:

query {
  users(func: has(firstName), orderdesc: firstName) {
    expand(_all_)
  }
}

The results are:

"data": {
    "users": [
      {
        "lastName": "Grande",
        "naturalOrder": 9,
        "firstName": "Susan",
        "uid": "0x2724"
      },
      {
        "lastName": "Palmer",
        "naturalOrder": 12,
        "firstName": "Susan",
        "uid": "0x2728"
      },
      {
        "lastName": "Button",
        "naturalOrder": 7,
        "firstName": "Susan",
        "uid": "0x271e"
      },
      {
        "lastName": "Dillan",
        "naturalOrder": 8,
        "firstName": "Susan",
        "uid": "0x271f"
      },
      {
        "lastName": "Lawrence",
        "naturalOrder": 11,
        "firstName": "Susan",
        "uid": "0x2725"
      },
      {
        "lastName": "Jackson",
        "naturalOrder": 10,
        "firstName": "Susan",
        "uid": "0x2721"
      },
      {
        "lastName": "Trenton",
        "naturalOrder": 14,
        "firstName": "Susan",
        "uid": "0x2722"
      },
      {
        "lastName": "Ralley",
        "naturalOrder": 13,
        "firstName": "Susan",
        "uid": "0x2729"
      },
      {
        "lastName": "Zabala",
        "naturalOrder": 6,
        "firstName": "Michel",
        "uid": "0x2723"
      },
      {
        "lastName": "Conrad",
        "naturalOrder": 3,
        "firstName": "Michel",
        "uid": "0x2720"
      },
      {
        "lastName": "Antwerp",
        "naturalOrder": 1,
        "firstName": "Michel",
        "uid": "0x2726"
      },
      {
        "lastName": "Fittspatrick",
        "naturalOrder": 4,
        "firstName": "Michel",
        "uid": "0x2727"
      },
      {
        "lastName": "Wallace",
        "naturalOrder": 5,
        "firstName": "Michel",
        "uid": "0x271d"
      },
      {
        "lastName": "Beeson",
        "naturalOrder": 2,
        "firstName": "Michel",
        "uid": "0x271c"
      }
    ]
  }

Well we get the “Susans” before the “Michels”, but when it comes to their last names, the order is random. That’s not what we want. When the first names are the same, we want to sort by last name, so we use the following query:

query {
  users(func: has(firstName), orderdesc: firstName, orderdesc: lastName) {
    expand(_all_)
  }
}

And we get the result:

"data": {
    "users": [
      {
        "naturalOrder": 14,
        "firstName": "Susan",
        "lastName": "Trenton"
      },
      {
        "naturalOrder": 13,
        "firstName": "Susan",
        "lastName": "Ralley"
      },
      {
        "naturalOrder": 12,
        "firstName": "Susan",
        "lastName": "Palmer"
      },
      {
        "naturalOrder": 11,
        "firstName": "Susan",
        "lastName": "Lawrence"
      },
      {
        "naturalOrder": 10,
        "firstName": "Susan",
        "lastName": "Jackson"
      },
      {
        "naturalOrder": 9,
        "firstName": "Susan",
        "lastName": "Grande"
      },
      {
        "naturalOrder": 8,
        "firstName": "Susan",
        "lastName": "Dillan"
      },
      {
        "naturalOrder": 7,
        "firstName": "Susan",
        "lastName": "Button"
      },
      {
        "naturalOrder": 6,
        "firstName": "Michel",
        "lastName": "Zabala"
      },
      {
        "naturalOrder": 5,
        "firstName": "Michel",
        "lastName": "Wallace"
      },
      {
        "naturalOrder": 4,
        "firstName": "Michel",
        "lastName": "Fittspatrick"
      },
      {
        "naturalOrder": 3,
        "firstName": "Michel",
        "lastName": "Conrad"
      },
      {
        "naturalOrder": 2,
        "firstName": "Michel",
        "lastName": "Beeson"
      },
      {
        "naturalOrder": 1,
        "firstName": "Michel",
        "lastName": "Antwerp"
      }
    ]
  }

Aha! Perfect ordering! This is exactly what I want. I want to emphasize that the ordering is not confused. Dgraph has it exactly right. The bug that I’m talking about is only an issue when it comes to pagination. So let’s try some pagination now. I won’t bother with “offset: 0” because then there is no problem. Let’s try this:

query {
  users(func: has(firstName), orderdesc: firstName, orderdesc: lastName, offset: 3, first: 8) {
    expand(_all_)
  }
}

The results are:

 "data": {
    "users": [
      {
        "lastName": "Trenton",
        "naturalOrder": 14,
        "firstName": "Susan"
      },
      {
        "lastName": "Ralley",
        "naturalOrder": 13,
        "firstName": "Susan"
      },
      {
        "lastName": "Lawrence",
        "naturalOrder": 11,
        "firstName": "Susan"
      },
      {
        "lastName": "Jackson",
        "naturalOrder": 10,
        "firstName": "Susan"
      },
      {
        "lastName": "Dillan",
        "naturalOrder": 8,
        "firstName": "Susan"
      },
      {
        "lastName": "Zabala",
        "naturalOrder": 6,
        "firstName": "Michel"
      },
      {
        "lastName": "Wallace",
        "naturalOrder": 5,
        "firstName": "Michel"
      },
      {
        "lastName": "Fittspatrick",
        "naturalOrder": 4,
        "firstName": "Michel"
      }
    ]
  }

What happened?! Exactly what I pointed out in earlier posts. Offset is discarding the first 3 results of “orderdesc: firstName” and ignoring the second order command. This is not an ordering issue; when I perform the query without pagination the order is correct. The problem is with offset.

Please try this with the dataset I have provided. You’ll see that the issue is real. Tomorrow I’ll happily open this as an issue with the Core Dev, just as soon as I figure out how to do that. But Michel, you’ve been a real super hero for me, so I feel it’s only right that I get you on my side.


#13

I can confirm that the problem is related to offset. I have a similar problem on my data sample.


(Michael Beeson) #14

I’ve filed a bug report of the dgraph github page. Fingers crossed it works! I’m still willing to believe that there’s a mistake in my syntax


(Michael Beeson) #15

Any chance of some comment on this? Is it indeed a bug? If not, could you tell me how to paginate data ordered in the way I outline in my last example (firstName, lastName)?


(Michel Conrado) #16

I’ll investigate carefully what you wrote.