Moved from GitHub dgraph/5807
Posted by EnricoMi:
Experience Report
What you wanted to do
I want to retrieve my result set via pagination as 1) I expect large results and 2) want to have fixed memory requirements while processing the stream of results.
What you actually did
I have a predicate with 40m triples. I want to retrieve all uids having this predicate, and its value:
{
result (func: has(<http://www.w3.org/2000/01/rdf-schema#label>), first: 1000, offset: 0) {
uid
<http://www.w3.org/2000/01/rdf-schema#label>
}
}
Why that wasn’t great, with examples
The time each query takes scales with the offset
:
offset |
total_ns |
---|---|
1,000 | 0.001s |
10,000 | 0.028s |
100,000 | 0.313s |
1,000,000 | 3.057s |
10,000,000 | 35.067s |
So pages at the end of the result set take much longer than at the beginning of it.
Looking at how after
scales, we see constant query time:
offset of uid in after
|
total_ns |
---|---|
1,000 | 0.016s |
10,000 | 0.019s |
100,000 | 0.008s |
1,000,000 | 0.007s |
10,000,000 | 0.019s |
The offset
pagination should be as scalable as after
. If this is not possible, the two different classes of scalability should be clearly documented at https://dgraph.io/docs/query-language/#pagination.