For protos, we do marshal the output. The suffix WithPre means we use a preTraverse routine that is generalized to handle any kind of output. The suffix WithPost means we use a postTraverse routine that is generalized.
For each test case, we see 6 lines. The first three lines are for actors with 10, 100, 1000 entities. The last three lines are for directors with 10, 100, 1000 entities.
In theory, postTraverse is a lot faster than preTraverse when posting lists in a UID matrix have a lot of overlap between one another. I personally prefer postTraverse. I would recommend switching to using only generalized postTraverse. The biggest gain seems to be ToPBWithPost. For one large test case, it is about twice as fast than ToPB or ToPBWithPre, and also uses less memory. However, for another large test case, it seems to be worse.
The tests are passing, so I think these generalized routines are probably correct.
Based on the benchmarks, ToJSONWithPre and ToPBWithPre seem to be the clear winners in almost all cases except maybe one. So, why is your recommendation different?
In theory, PostTraverse could do a lot less work than PreTraverse when there is a lot of overlap between posting lists. It seems to do worse mainly for smaller test cases. I also think there is much room for improvement for PostTraverse. For example, we probably can do without those two hash maps from UID to results.
In short, my “gut feeling” is that PostTraverse is the better solution in the long term, but would need more work. Perhaps until then, we want to keep both around, or switch to using PreTraverse only.
If the main argument is that PostTraverse wins in large data sets, when there’s a lot of overlap, I’d suggest going systematically about it. Divide up the overlap from 0% to 100%, in increments of 10% or so. Pick a large data set size, draw a graph and then see which approach wins.
You can similarly do this for data sets of increasing size. 10,100,1000 is just a start. Try with 10K, 100K, 1M, 10M and see what works there.
Because based on the existing data, Pre works better. But, if you have a gut feeling that post would work better for the above scenarios, then it would make sense to plot it with new data and prove it.
Consider ToPB. The first half is for “Pre” and the second half is for “Post”. The setup is as follows. One node has 5000 neighbors. Each of these neighbors have exactly one neighbor. However, these 5000 neighbors of neighbors can have a lot of overlap. We impose the condition that the number of distinct neighbors of neighbors is N and take N=1,1000,2000,3000,4000,5000. For N=5000, there is no overlap at all. For N=1, there is max overlap. Each of these neighbor of neighbor has about 50 attribute values. Here are the results:
It seems strange that all Pres have about the same number of allocations per operation and bytes per op. In fact, there isn’t much variation between the time taken either for all 5 out of 6 cases. I think there is some mistake here.
I took a look at the code. I don’t see anything obvious why it could be wrong. Both benchmarks used the same function to generate the test cases. And when I think about it, it is not so surprising that all the mem usage is similar. PreTraverse does not take into account any overlap information and will do the same amount of work as we vary the number of unique child SubGraph but keep the total number of nodes / SubGraphs constant. I would expect the running time to be similar as well.
That said, when I run the tests I see warnings like
I am not sure if this makes the benchmarks inaccurate. Might have to disable RAFT to make it work. Sometimes I wonder if a master-slave setup with some replication for masters (like Zookeeper) could make our life easier while not giving up all that much in practice. Currently, I set my test runs to use only one or two CPUs as using more sometimes cause the tests to fail…