The naming is as follows: BenchmarkToAndFrom/<protocol>-<number of uids>-<number of cores>
Flatbuffers is clearly the winner after the first result, where we only had 10 uids. I don’t like the ugliness that Flatbuffers brings to our code base, but clearly it has a significant impact on our performance and memory allocations. And therefore, we should stick to it.
I can try “packed” and “fixed64” some time soon. (uint64 is in varint format which takes longer to parse. Non-packed means we have a tag ID for each element, I believe, and is less efficient when you have a large array.
Update: I realize in proto3, packed is turned on by default.
Okay, so I did a couple more things. Tried with the GC inside the loop, but it just made all of them worse, and comparatively, they were the same… so ignore that for now.
The second thing I did was to also iterate over the resulting structure (nl in code), and make that part of the timed loop. This was interesting, because this made FB much worse.
Given these results, it’s clear that PBs are better than FB. This result makes me very happy. We have been biting off the complicated API of Flatbuffers for a while, in the hope that it makes things faster for us - but given this benchmark, it’s clear that that’s not the case. The only place that FB helps is in terms of memory allocation per operation - in everything else PB-Fixed is a clear winner.
This is super exiciting, because this would make our code base a lot simpler, and allow mutability over the structures, which @jchiu has been looking for.
I think we can get rid of FB, yes! You have been complaining about it for a while, and while I’ve been defending FB, I wasn’t happy with what their API does to our code quality. FB is just cumbersome… infact, recently I switched one new endpoint, Backup, to PB, and it’s just so much simpler.
So yeah, unless one of us can find a flaw with the tests, we’re good to go doing the switch from FB to PB!
Benchmarking is a great way to prove a point – numbers win arguments easily.
Yeah, you can do it. In fact, I think we might not even need the algo.UIDList anymore, and we can use the memory management techniques to do filtering inplace etc. Let’s do it in a series of PRs, each not more than 300 lines of changes or so… even if they break the build, it’s alright. This is a good chance to clean up our code base, and refactor code as well – so I want to reviewing every PR.
Good luck! And happy to discuss anything related to this.