Method 1
Suppose most of the time we only need to get the count of the whole UID list, i.e., afterUID=0.
In this method, we will maintain one additional int to store the size of the posting list. We will update this var as we insert and delete elements. We just have to be a bit more careful. When asked for the count of the whole UID list, we return this var. Otherwise, we have to iterate as before.
Method 2
Keep one list for deleted elements and one list for inserted elements. They do not intersect with each other. When you delete an element, you will check the list for inserted elements and remove an element if it is found. Same goes for inserting an element: you may need to remove an element from the list for deleted elements.
When asked for the size of the array with some afterUID, we will do binary search on all three lists: the two lists above and the posting list. Return A+B-C where A, B, C are respectively number of elements after UID in posting list, list for inserted elements, list for deleted elements.
The downside is that we need to do more work when removing elements from the lists of inserted and deleted elements.
Method 3
Extend Method 1. Instead of maintaining only one var for the total size, we will maintain sizes of ranges. We will do this in a multiscale way so that each insert/delete will update the var only in O(log n) time.
Method 4
Like Method 3, but do it in a more straightforward way, not in a multiscale way. This will somewhat resembles the very original method and feel as if you are maintaining some mindex
. Basically, when you insert or delete, you might need to update up to O(n) counts.
My preference
I am slightly leaning towards Method 2, then Method 1. I love to hear your opinions @core-devs