I have a specific use case where I need to support a very large number of fixed size keys (target: 30 billion keys), all 16-byte in size and with an empty value (this is mainly to check for key duplicates).
To help scaling, I have created 32 shards, each shard being a separate badger db instance all operating from the same go application. The main purpose of this application is to keep a history of last month of keys (use TTL=1 month) and have the highest throughput for key duplicate check+insertion if not present. Duplicates are rare so the main use case will check that a key is not present and will insert it.
Given that each key is exactly 16 byte in size, 10B keys all loaded in memory would use 160GB.
While filling it up to 10billion keys using the batch API (using commits of batches of 100,000 keys), I noticed the memory usage (RSS) increasing continuously all the way to 400GB.
I have a few questions:
- is the use case of a very large number of keys an appropriate use of badger?
- it is possible to prevent badger from loading all keys in memory and if yes how?
- is it expected to have that much memory usage (Over 2x the space used by keys)?
I have tried to limit memory consumption using various suggestions from other threads but to no avail.
my current settings:
opts := badger.DefaultOptions(db.path)
opts.WithValueLogMaxEntries(1000 * 1000 * 100)
I have compared badger, boltdb, pebble, go-leveldb and so far badger has been the best performer however the memory usage is a show-stopper for me.