Pagination using "after" doens't respect sort order

Moved from GitHub dgraph/3472

Posted by romshark:

The following issue is pretty much a clone of #2744 but with a different dataset and based on v1.0.14

  • What version of Dgraph are you using?

  • Have you tried reproducing the issue with latest release?

  • What is the hardware spec (RAM, OS)?

    • OS: Windows 10 (Linux in docker)
    • RAM: 64 GB
  • Steps to reproduce the issue (command/config used to run Dgraph).

    Post.id: string @index(exact) .
    Post.creation: dateTime .
    Post.title: string .
    Post.contents: string .
    
    • Fill the database with test data:
    {
      set {
        _:post1 <Post.id> "00000000000000000000000000000006" .
        _:post1 <Post.title> "post 1" .
        _:post1 <Post.contents> "post 1 contents" .
        _:post1 <Post.creation> "2019-05-28T10:00:00+00:00" .
    
        _:post2 <Post.id> "00000000000000000000000000000005" .
        _:post2 <Post.title> "post 2" .
        _:post2 <Post.contents> "post 2 contents" .
        _:post2 <Post.creation> "2019-05-28T10:30:00+00:00" .
        
        _:post3 <Post.id> "00000000000000000000000000000004" .
        _:post3 <Post.title> "post 3" .
        _:post3 <Post.contents> "post 3 contents" .
        _:post3 <Post.creation> "2019-05-28T11:00:00+00:00" .
        
        _:post4 <Post.id> "00000000000000000000000000000003" .
        _:post4 <Post.title> "post 4" .
        _:post4 <Post.contents> "post 4 contents" .
        _:post4 <Post.creation> "2019-05-28T11:30:00+00:00" .
        
        _:post5 <Post.id> "00000000000000000000000000000002" .
        _:post5 <Post.title> "post 5" .
        _:post5 <Post.contents> "post 5 contents" .
        _:post5 <Post.creation> "2019-05-28T12:00:00+00:00" .
        
        _:post6 <Post.id> "00000000000000000000000000000001" .
        _:post6 <Post.title> "post 6" .
        _:post6 <Post.contents> "post 6 contents" .
        _:post6 <Post.creation> "2019-05-28T12:30:00+00:00" .
      }
    }
    
    • Read order of objects:
    {
      all(
        func: has(Post.id),
        orderasc: Post.id
      ) {
        uid
        Post.id
        Post.title
        Post.contents
        Post.creation
      }
    }
    

    (my results: 0x7, 0xc, 0xb, 0xa, 0x9, 0x8)

    • Try read a page of 3 posts after the third:
    {
      all(
        func: has(Post.id),
        orderasc: Post.id,
        first: 3,
        after: 0xb
      ) {
        uid
        Post.id
        Post.title
        Post.contents
        Post.creation
      }
    }
    
  • Expected behaviour and actual result.

    • expected: 0xa, 0x9, 0x8
    • actual: 0xc

MichelDiz commented :

Turns out that “after” don’t support ordering (only UID), so this will be turn to a feature request.

manishrjain commented :

You should be able to use offset along with first. Does that not work?

MichelDiz commented :

offset works fine indeed. We get the right order even 1 by 1. ( first: 1, offset: 0, 1, 2, 3, 4, 5)

romshark commented :

@manishrjain offset doesn’t scale (see Offset-based pagination is slow · Issue #3473 · dgraph-io/dgraph · GitHub), it’s very slow on relatively large datasets. I’ve tested it on ~83k nodes, which is far from a million but even that took almost forever (2.5 - 5 seconds). I had to suspend the pagination feature in my tech-demo because of that.

In SQL I’d usually use an indexed cursor-based approach: after: <id>; limit: 100 where the cursor must be unique because offset usually results in a full-table-scan, which is obviously slow and this seems to be what Dgraph is doing. Cursor-based pagination makes the database quickly find the row/node to start reading from and then reads 100 rows since the cursor (id) is ordered.

I think it could be possible to simulate such an index with an ordered/indexed edge but that’s probably far from ideal.

sleto-it commented :

Hi, I am implementing some data validation tests for the Labels of our GitHub tickets. This is part of a wider metrics project

The check that performs the validation of GitHub tickets by “kind” currently fails as this ticket has both the “kind\bug” and “kind\feature” labels

Please would it be possible to remove one of the two labels, so that the ticket is categorized either as “bug” or "feature? If this ticket indeed includes both a bug and a feature, please let’s create an additional spin-off ticket, so we keep one as feature and one as bug

Many thanks,