By windowing, I prefer to the params
In the past, windowing is done in the worker. Worker will window each posting list and return it. This is very efficient.
However, with filters, we can no longer do that it seems. The main reason is that windowing has to take place after filter. But filtering can involve multiple predicates, and has to be done in a centralized way.
Say you are at the friend predicate. You have a set of source UIDs,
S, you want the friends of these UIDs, with some filters and windowing.
For each source UID, you first get its friends. Then we union across all friends to get
sorted which is a sorted list of unique UIDs that are friends of some element in
Next, we apply filter to
sorted and get a new
sorted list. Now, we want to apply windowing per element in
S. What is the best way to do it? Per element in
S, we need to intersect its friend UIDs with the new
sorted, and then apply the windowing. And then we need to merge these sorted lists (after windowing) yet again before we fan out with a
processGraph call for child predicates.
(If there is no filtering and/or no windowing, some of these must be skipped.)
All the above work seems necessary. The question is: Now that we have done all these filtering and windowing, should we store the results? The answer seems yes to me.
Here is an alternative. Do this in post-traversal or pre-traversal. We just store
sorted in the subgraph. Recall this is the list of filtered and windowed destination UIDs. In post-traversal / pre-traversal, before we output these results per element in
S, we will intersect
sorted to apply windowing and filtering.
This doesn’t sound so great because (1) building the intermediate results can actually free up memory if the windowing / filtering discards a lot of items and (2) the work has to be done anyway, why not store it and discard the original
task.Result and (3) currently pre-traversal and post-traversal both exist and this logic has to be duplicated in both.
What do you all think? @core-devs Thanks.