Help needed tuning Badger

Hey,

We’re interested in replacing our current KV-store (GitHub - syndtr/goleveldb: LevelDB key/value database in Go.) with Badger. The use case would be to persist a local cache of Key-Values persisted in Kafka. Each partition of Kafka is stored in a separate DB, so we have 20 Badger databases per service. The message rates vary a lot. If we start our service without the local cache and we have to recover it from Kafka, we are currently able to load around 6500 msg/s into a single DB (out of the 20) while during normal operation it could be around 100 msg/s. During the recovery we do not call PurgeOlderVersions() nor RunValueLogGC(0.5). After the recovery is finished, we start calling these after a specified number of transactions. However, after the recovery is finished and we start calling the clean up functions, our RAM usage shoots up from 500-1000MB to 29GB before the program is OOM killed.

Here are some specific open questions we have:

  1. Why does the the RAM shoot up so high once we start doing clean up?
  2. Should the clean up functions be called after every transaction? RunValueLogGC() only cleans up maximum of one log file, so I am not sure what is the correct interval to call it.
  3. What is the formula to precisely calculate Badger’s RAM usage?
  4. How exactly do the different FileLoadingModes work? As I’ve understood, Badger keeps the LSM tree containing the keys in memory at all times. Do the FIleLoadingModes then only affect ValueLog segments?
  5. How can we further speed up from the default settings? Currently goleveldb is able to recover about 20MBps while Badger is hovering around 15MBps. Goleveldb does this while consuming maximum of 700MB RAM. Our keys are 22 bytes and our values are generally around 200 bytes.

Is there documentation on how the different configuration options work and affect performance, mainly execution speed and memory usage?

This is really strange – not something we’ve seen. If you can take a heap profile, before it dies, it can give us useful insight into why this is happening. Also, if there’s a way to replicate this at our end, that’d be awesome.

Not sure. Need heap profile, or reproducibility.

Of course, no need to call it every txn. You can choose when you want to call it. Ideally, when you have a period of low activity, that’d be a good time to call these. Otherwise, every time you add a Gig, you could call this. But, note that Value log GC would be ineffective, until Purge is called.

It’s a function of LSM tree size, number of memtables, number of compactions going on, and memory map of the value log.

The file loading modes corresponds to the LSM tree. By default, it’s in RAM, and we recommend it that way. But, you can put them in mmap, or read from disk every time. Value log is always mmaped.

Badger is a lot faster in terms of writes. Do bigger transactions, and run them concurrently with many goroutines. If you’re doing them in single goroutine, then do bigger txns via callback functions.

If it’s not in the README, then we could add a paragraph about it. Please feel free to file a Github issue regarding the documentation.

Added a paragraph about tweaking memory usage. Hope that clarifies!

https://github.com/dgraph-io/badger/commit/c27c7e9ef57ef622b2268651a922f361e7eb4cd3

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.