Pagination with `offset` should scale like `after`

diggy · July 3, 2020, 7:38am

Moved from GitHub dgraph/5807

Posted by EnricoMi:

Experience Report

What you wanted to do

I want to retrieve my result set via pagination as 1) I expect large results and 2) want to have fixed memory requirements while processing the stream of results.

What you actually did

I have a predicate with 40m triples. I want to retrieve all uids having this predicate, and its value:

{
  result (func: has(<http://www.w3.org/2000/01/rdf-schema#label>), first: 1000, offset: 0) {
    uid
    <http://www.w3.org/2000/01/rdf-schema#label>
  }
}

Why that wasn’t great, with examples

The time each query takes scales with the offset:

`offset`	`total_ns`
1,000	0.001s
10,000	0.028s
100,000	0.313s
1,000,000	3.057s
10,000,000	35.067s

So pages at the end of the result set take much longer than at the beginning of it.

Looking at how after scales, we see constant query time:

`offset` of uid in `after`	`total_ns`
1,000	0.016s
10,000	0.019s
100,000	0.008s
1,000,000	0.007s
10,000,000	0.019s

The offset pagination should be as scalable as after. If this is not possible, the two different classes of scalability should be clearly documented at https://dgraph.io/docs/query-language/#pagination.

Topic		Replies	Views
Pagination with `offset` does not scale with var blocks Dgraph dgraph , kind:enhancement , exp:intermediate , area:performance , area:querylang:vars	0	693	July 3, 2020
No cursor-based pagination and offset-based pagination is slow Dgraph kind:bug	1	1630	March 4, 2021
Sample your result Dgraph	0	633	July 28, 2020
Offset-based pagination is slow Dgraph dgraph , kind:enhancement , priority:p1 , popular , area:performance	7	1349	September 4, 2020
After and orderdesc: question on large datasets, ordering, and pagination Dgraph kind:question , dql	2	194	March 26, 2024

Pagination with `offset` should scale like `after`

Experience Report

What you wanted to do

What you actually did

Why that wasn’t great, with examples

Related topics