Pagination with `offset` should scale like `after`

Moved from GitHub dgraph/5807

Posted by EnricoMi:

Experience Report

What you wanted to do

I want to retrieve my result set via pagination as 1) I expect large results and 2) want to have fixed memory requirements while processing the stream of results.

What you actually did

I have a predicate with 40m triples. I want to retrieve all uids having this predicate, and its value:

{
  result (func: has(<http://www.w3.org/2000/01/rdf-schema#label>), first: 1000, offset: 0) {
    uid
    <http://www.w3.org/2000/01/rdf-schema#label>
  }
}

Why that wasn’t great, with examples

The time each query takes scales with the offset:

offset total_ns
1,000 0.001s
10,000 0.028s
100,000 0.313s
1,000,000 3.057s
10,000,000 35.067s

So pages at the end of the result set take much longer than at the beginning of it.

Looking at how after scales, we see constant query time:

offset of uid in after total_ns
1,000 0.016s
10,000 0.019s
100,000 0.008s
1,000,000 0.007s
10,000,000 0.019s

The offset pagination should be as scalable as after. If this is not possible, the two different classes of scalability should be clearly documented at https://dgraph.io/docs/query-language/#pagination.

1 Like