We’re interested in replacing our current KV-store (https://github.com/syndtr/goleveldb) with Badger. The use case would be to persist a local cache of Key-Values persisted in Kafka. Each partition of Kafka is stored in a separate DB, so we have 20 Badger databases per service. The message rates vary a lot. If we start our service without the local cache and we have to recover it from Kafka, we are currently able to load around 6500 msg/s into a single DB (out of the 20) while during normal operation it could be around 100 msg/s. During the recovery we do not call PurgeOlderVersions() nor RunValueLogGC(0.5). After the recovery is finished, we start calling these after a specified number of transactions. However, after the recovery is finished and we start calling the clean up functions, our RAM usage shoots up from 500-1000MB to 29GB before the program is OOM killed.
Here are some specific open questions we have:
- Why does the the RAM shoot up so high once we start doing clean up?
- Should the clean up functions be called after every transaction? RunValueLogGC() only cleans up maximum of one log file, so I am not sure what is the correct interval to call it.
- What is the formula to precisely calculate Badger’s RAM usage?
- How exactly do the different FileLoadingModes work? As I’ve understood, Badger keeps the LSM tree containing the keys in memory at all times. Do the FIleLoadingModes then only affect ValueLog segments?
- How can we further speed up from the default settings? Currently goleveldb is able to recover about 20MBps while Badger is hovering around 15MBps. Goleveldb does this while consuming maximum of 700MB RAM. Our keys are 22 bytes and our values are generally around 200 bytes.
Is there documentation on how the different configuration options work and affect performance, mainly execution speed and memory usage?