Filtering and windowing and should we "replace" task.Result?

By windowing, I prefer to the params offset and count.

In the past, windowing is done in the worker. Worker will window each posting list and return it. This is very efficient.

However, with filters, we can no longer do that it seems. The main reason is that windowing has to take place after filter. But filtering can involve multiple predicates, and has to be done in a centralized way.

Say you are at the friend predicate. You have a set of source UIDs, S, you want the friends of these UIDs, with some filters and windowing.

For each source UID, you first get its friends. Then we union across all friends to get sorted which is a sorted list of unique UIDs that are friends of some element in S.

Next, we apply filter to sorted and get a new sorted list. Now, we want to apply windowing per element in S. What is the best way to do it? Per element in S, we need to intersect its friend UIDs with the new sorted, and then apply the windowing. And then we need to merge these sorted lists (after windowing) yet again before we fan out with a processGraph call for child predicates.

(If there is no filtering and/or no windowing, some of these must be skipped.)

All the above work seems necessary. The question is: Now that we have done all these filtering and windowing, should we store the results? The answer seems yes to me.

Here is an alternative. Do this in post-traversal or pre-traversal. We just store sorted in the subgraph. Recall this is the list of filtered and windowed destination UIDs. In post-traversal / pre-traversal, before we output these results per element in S, we will intersect sorted to apply windowing and filtering.

This doesn’t sound so great because (1) building the intermediate results can actually free up memory if the windowing / filtering discards a lot of items and (2) the work has to be done anyway, why not store it and discard the original task.Result and (3) currently pre-traversal and post-traversal both exist and this logic has to be duplicated in both.

What do you all think? @core-devs Thanks.

I’m not quite clear of why this sounds complex.

You’re right that windowing now needs to be done at the coordinator server, i.e. the server which runs the SubGraph processing. But, that’s largely the change I see.

The filters processing would take in sorted uids, and return back intersections of the filter with the sorted Uids passed in task.Query. Once we have all the filter results, we run the necessary intersections or unions and generate a new sorted Uid list. This is when windowing is applied, and this goes to the children.

You could choose to store this filtered sorted UID list, if that helps in filtering out unneeded elements in post processing (ToJson or ToProto). But, window application shouldn’t reach the post processing stage.

P.S. I think the right term is pagination.

Let me see if I miss something basic here.

The pagination has to be done per source UID. It is not sufficient to apply it to the filtered sorted array because that is the union of all the destination UIDs? Instead, we need to apply the filter to posting list of each source UID, apply the pagination, then merge again to get a new sorted?

As discussed on Slack:

The alternative way is, we start storing query uids in task.Result.

[4:54 PM]
Not all queries would have pagination, in fact very few branches will.

[4:55 PM]
So, we can just rewrite task.Result for those branches.

[4:55 PM]
That’d still be an overall cheaper operation than using the slower PBs to FBs.

[4:55 PM]
Then we don’t need to look at task.Query to interpret task.Result.

[4:56 PM]
Most task.Results would stay as they were retrieved from network. Only some need to be changed, and for those, we just rewrite them.

[4:56 PM]
Pagination of say 100 results or so, essentially makes that a very cheap operation anyway.

[4:56 PM]
when we could have millions of results in one task.Result.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.