Offset-based pagination is slow

diggy · May 28, 2019, 3:04pm

Moved from GitHub dgraph/3473

Posted by romshark:

Since after-based pagination doesn’t work (see #3472 and #2744 and the forum) I had to fall back to offset-based pagination instead, which I expected to be slow (since offsets usually never use the indexes), yet I’m not sure whether it’s generally expected to be that slow.

What version of Dgraph are you using?
- v1.0.14
Have you tried reproducing the issue with latest release?
- I did.
What is the hardware spec (RAM, OS)?
- OS: Windows 10 (Linux in docker)
- RAM: 64 GB

Steps to reproduce the issue (command/config used to run Dgraph).

Setup testing environment with docker-compose using the latest Dgraph images
Setup the schema:

Post.id: string @index(hash) .
Post.creation: dateTime .
Post.title: string .
Post.contents: string .

Fill the database with lots of data using this template:

{
  set {
    _:post1 <Post.id> "00000000000000000000000000000006" .
    _:post1 <Post.title> "post 1" .
    _:post1 <Post.contents> "post 1 contents" .
    _:post1 <Post.creation> "2019-05-28T10:00:00+00:00" .
  }
}

(I used a dataset of 83.719 nodes)

Read the last 10 items of, say, 100k:

{
  all(
    func: has(Post.id),
    orderasc: Post.id,
    first: 10,
    offset: 99990
  ) {
    uid
    Post.id
    Post.title
    Post.contents
    Post.creation
  }
}

Expected behaviour and actual result.
- expected: as I already said, I expected offset to be slow, but since I had no other option left for pagination I could expect it to optimize this query using the hash index, otherwise pagination is pretty much impossible to get fast.
- actual: it takes almost 2.5 - 5 seconds!

diggy · June 10, 2019, 5:36pm

danielmai commented :

I don’t think offset per-se is what’s slow here. Pagination (first, offset, after) is fairly cheap. According to the query trace for the query you shared with ~100k Posts from your example, most of the time is taken with sorting. Here’s a trace from Jaeger, showing that sorting took 1.8 seconds.

Removing the sort criteria from the query (orderasc: Post.id) speeds up the query significantly, from >2s down to 300ms, which is mostly taken up by has() as it doesn’t use an index and iterates over the database. There might be some optimizations we can do here with sorting and pagination combined.

diggy · June 10, 2019, 11:37pm

romshark commented :

@danielmai I understand, but how do we do pagination over a sorted dataset then?

What if I wanted to serve a paginable list of 100k+ posts sorted by Post.creation and Post.id (since Post.creation isn’t unique). AFAIK there’s no way to make your own index using a sorted edge like postListByCreationTime: uid @index(hash) @sort(Post.creation, Post.id) which would allow for fast offset based pagination.

diggy · January 16, 2020, 2:42pm

dmitryyankowski commented :

Any updates on this @danielmai @campoy

diggy · April 1, 2020, 7:34pm

dmitryyankowski commented :

Bump

diggy · April 22, 2020, 7:42pm

igormiletic commented :

Any work planned on this soon? We are expressing this as well and we realized that as offset get bigger and bigger query become slower and slower.

We have about 30.000.000 nodes and we have case that we need to export some data to AWS S3 (to make it available for Athena queries). It is almost impossible to extract all nodes using pagination as query become slower and slower as offset increase.

diggy · July 3, 2020, 8:04am

EnricoMi commented :

I have raised a similar feature request #5807 and bug report #5808 regarding offset scaling, but without sorting. Please see there for some numbers.

wildan2711 · September 4, 2020, 1:24pm

Bump. I think this is really important for displaying list data,

Topic		Replies	Views
No cursor-based pagination and offset-based pagination is slow Dgraph kind:bug	1	1649	March 4, 2021
Pagination using "after" doens't respect sort order Dgraph dgraph , status:accepted , kind:bug , area:querylang:pagin	5	703	November 25, 2019
Pagination with `offset` should scale like `after` Dgraph dgraph , kind:enhancement , exp:intermediate , area:performance , area:querylang:pagin	0	692	July 3, 2020
Query results are flaky and inconsistent with Offset usage Dgraph dgraph , status:more-info-nee	8	654	March 26, 2020
Offset appears not to work with multiple order statements Users	16	1330	June 5, 2019

Offset-based pagination is slow

Related topics