We are using Badger as a built-in DB on desktop and mobile devices. Before v2.0, GC was working not perfectly(because of random sampling) but most of the time it has acceptable results. But since badger 2.0 GC mostly not working for our use cases so we have started to dig in.
I’ve tried to reproduce the problem using the existing BenchmarkDbGrowth.
Flatten + GC will do nothing and at the end you will have 10GB of vlogs.
You can also decrease the valueSize, by also decreasing the
After some debugging here is what I’ve found.
- badger calculates the discard stat only when compaction is successfully done. It is not triggered when the number of sst tables <=
NumLevelZeroTables. The default
- the sst table flushed from the memtable only in case either skiplist or memtable is full (>=
- with default
opts.MemTableSizeof 64MB and values sizes >
ValueThreshold(means all the values go into vlog files) it may takes A LOT OF keys updates until skiplist or memtable grows over the default
opts.MemTableSize. (approx 600k 8 byte keys updates)
- with the default
opts.MemTableSize=64MBit takes 3.3mln 8 byte key updates in order to trigger compaction
- before, it was a fallback to random vlog picking in case discardStat is empty. But it was removed in PR #1555
I run BenchmarkDbGrowth
I expect GC to work and clean up vlogs
GC has no affect on vlogs.
opts.ValueThresholdto a greater value to collocate all values in the LSM tree
- this will make LSM compactions more expensive so we will lose the badgerDB advantages.
- set the
opts.MemTableSizeto very little value, but
- you may still want to have a big enough memtable
- it is not clear for users that values GC may not work because of the memtable size
- revert the fallback to random vlog picking
- create a mechanism to calculate discardStat without doing actual compaction.
The last one seems the most convenient so I made a try: Comparing dgraph-io:master...anytypeio:gc · dgraph-io/badger · GitHub
Let me know what do you think