GC may not work in some cases

requilence · April 22, 2022, 3:47pm

We are using Badger as a built-in DB on desktop and mobile devices. Before v2.0, GC was working not perfectly(because of random sampling) but most of the time it has acceptable results. But since badger 2.0 GC mostly not working for our use cases so we have started to dig in.

What version of Go are you using (`go version`)?

1.17.8

What operating system are you using?

macOS 12.1

What version of Badger are you using?

v3.2103.2

Does this issue reproduce with the latest master?

Yes

Steps to Reproduce the issue

I’ve tried to reproduce the problem using the existing BenchmarkDbGrowth.

Set the numKeys=100, valueSize=1024*1024+1 and maxWrites=100

Flatten + GC will do nothing and at the end you will have 10GB of vlogs.

You can also decrease the valueSize, by also decreasing the opts.ValueThreshold. E.g. numKeys=2000 valueSize=1024+1 maxWrites=200 and opts.ValueThreshold=1024

After some debugging here is what I’ve found.

badger calculates the discard stat only when compaction is successfully done. It is not triggered when the number of sst tables <= NumLevelZeroTables. The default NumLevelZeroTables is 5
the sst table flushed from the memtable only in case either skiplist or memtable is full (>= opts.MemTableSize)
with default opts.MemTableSize of 64MB and values sizes > ValueThreshold(means all the values go into vlog files) it may takes A LOT OF keys updates until skiplist or memtable grows over the default opts.MemTableSize. (approx 600k 8 byte keys updates)
with the default opts.NumLevelZeroTables=5 and opts.MemTableSize=64MB it takes 3.3mln 8 byte key updates in order to trigger compaction
before, it was a fallback to random vlog picking in case discardStat is empty. But it was removed in PR #1555

What did you do?

I run BenchmarkDbGrowth

What did you expect to see?

I expect GC to work and clean up vlogs

What did you see instead?

GC has no affect on vlogs.

Possible solutions

set opts.ValueThreshold to a greater value to collocate all values in the LSM tree

this will make LSM compactions more expensive so we will lose the badgerDB advantages.

set the opts.MemTableSize to very little value, but

you may still want to have a big enough memtable
it is not clear for users that values GC may not work because of the memtable size

revert the fallback to random vlog picking
create a mechanism to calculate discardStat without doing actual compaction.

The last one seems the most convenient so I made a try: https://github.com/dgraph-io/badger/compare/master...anytypeio:gc

Let me know what do you think

requilence · April 22, 2022, 3:59pm

So I went ahead and created the PR GC discardStat fallback mechanism by requilence · Pull Request #1784 · dgraph-io/badger · GitHub

requilence · April 26, 2022, 9:52am

there are some other badger users who reported similar problems with GC: Confused on how GC and Compaction are supposed to work, [badger v3] GC doesn't consider value log growth

Topic		Replies	Views
[badger v3] GC doesn't consider value log growth Badger kind:bug	1	1006	April 26, 2022
Confused on how GC and Compaction are supposed to work Badger kind:question	2	1684	April 26, 2022
[badger v3] updateDiscardStats unexpected fault address Badger kind:bug	3	701	September 28, 2021
Looking fo ValueLog GC Insight for Interesting Badger Use Case Badger	6	1369	May 14, 2020
Vlog files becoming really big, GC returning "Value log GC attempt didn't result in any cleanup" Badger	4	1882	July 25, 2018