What version of Go are you using (go version
)?
$ go version go version go1.16.15
What operating system are you using?
macos
What version of Badger are you using?
github.com/dgraph-io/badger/v3 v3.2103.2
Does this issue reproduce with the latest master?
yes.
Steps to Reproduce the issue
Backgound
I use badger to store some log
data, there are 2 kind of write:
- Store new log data with timestamp, and the log data value size is
10KB ~ 10MB
. - Delete the log data older than 1 day.
Problem
The badger consume too much disk space:
$ du -h data
13G data
There are too many kind of old version keys in badgerdb, and NumVersionsToKeep
is 1
, I write a program to scan the badgerdb:
func ListBadgerDB(db *badger.DB, allVersion bool) error {
keySize := 0
valueSize := 0
keyCount := 0
err := db.View(func(txn *badger.Txn) error {
opts := badger.DefaultIteratorOptions
opts.AllVersions = allVersion
it := txn.NewIterator(opts)
defer it.Close()
for it.Rewind(); it.Valid(); it.Next() {
item := it.Item()
k := item.Key()
err := item.Value(func(v []byte) error {
valueSize += len(v)
return nil
})
if err != nil {
return err
}
keySize += len(k)
keyCount++
}
return nil
})
if err != nil {
return err
}
log.Info("finish get total badger db size",
zap.Float64("size(GB)", float64(keySize+valueSize)/GB),
zap.Bool("all-version", allVersion),
zap.Int("count", keyCount),
zap.Float64("key-size(MB)", float64(keySize)/MB),
zap.Float64("value-size(GB)", float64(valueSize)/GB),
)
return nil
}
The out put log is:
[2022/03/22 17:16:31.913 +08:00] [INFO] [debug.go:68] ["finish get total badger db size"] [size(GB)=0.5796657186001539] [all-version=false] [count=5884] [key-size(MB)=0.08347034454345703] [value-size(GB)=0.5795842045918107]
[2022/03/22 17:16:33.848 +08:00] [INFO] [debug.go:68] ["finish get total badger db size"] [size(GB)=13.658869383856654] [all-version=true] [count=264109] [key-size(MB)=3.5311460494995117] [value-size(GB)=13.65542099904269]
As you can see, there are too many old version keys doesn’t been GC deleted. The valid data size only has 0.5GB, but total version data size is 13GB.
Value Log GC Goroutine
I already has an goroutine to do value log GC
func doGCLoop(db *badger.DB, closed chan struct{}) {
// run gc when started.
runGC(db)
ticker := time.NewTicker(1 * time.Minute)
for {
select {
case <-ticker.C:
runValueLogGC(db)
case <-closed:
return
}
}
}
func runValueLogGC(db *badger.DB) {
// at most do 10 value log gc each time.
for i := 0; i < 10; i++ {
err := db.RunValueLogGC(0.001)
if err != nil {
if err == badger.ErrNoRewrite {
log.Info("badger has no value log need gc now")
} else {
log.Error("badger run value log gc failed", zap.Error(err))
}
return
}
log.Info("badger run value log gc success")
}
}
But the log display there are no value log need gc.
[2022/03/22 17:11:35.621 +08:00] [INFO] [gc.go:58] ["badger has no value log need gc now"]
[2022/03/22 17:12:35.620 +08:00] [INFO] [gc.go:58] ["badger has no value log need gc now"]
[2022/03/22 17:13:35.620 +08:00] [INFO] [gc.go:58] ["badger has no value log need gc now"]
[2022/03/22 17:14:35.619 +08:00] [INFO] [gc.go:58] ["badger has no value log need gc now"]
What Badger options were set?
opts := badger.DefaultOptions(dataPath).
WithZSTDCompressionLevel(3).
WithBlockSize(8 * 1024).
WithValueThreshold(128 * 1024).
WithLogger(l)
What did you do?
What did you expect to see?
BadgerDB consume less disk space.
What did you see instead?
BadgerDB consume too much disk space.