BadgerDB consume too much disk space

What version of Go are you using (go version)?

$ go version
go version go1.16.15 

What operating system are you using?

macos

What version of Badger are you using?

github.com/dgraph-io/badger/v3 v3.2103.2

Does this issue reproduce with the latest master?

yes.

Steps to Reproduce the issue

Backgound

I use badger to store some log data, there are 2 kind of write:

  • Store new log data with timestamp, and the log data value size is 10KB ~ 10MB.
  • Delete the log data older than 1 day.

Problem

The badger consume too much disk space:

$ du -h data
13G    data

There are too many kind of old version keys in badgerdb, and NumVersionsToKeep is 1, I write a program to scan the badgerdb:

func ListBadgerDB(db *badger.DB, allVersion bool) error {
	keySize := 0
	valueSize := 0
	keyCount := 0
	err := db.View(func(txn *badger.Txn) error {
		opts := badger.DefaultIteratorOptions
		opts.AllVersions = allVersion
		it := txn.NewIterator(opts)
		defer it.Close()
		for it.Rewind(); it.Valid(); it.Next() {
			item := it.Item()
			k := item.Key()
			err := item.Value(func(v []byte) error {
				valueSize += len(v)
				return nil
			})
			if err != nil {
				return err
			}
			keySize += len(k)
			keyCount++
		}
		return nil
	})
	if err != nil {
		return err
	}

	log.Info("finish get total badger db size",
		zap.Float64("size(GB)", float64(keySize+valueSize)/GB),
		zap.Bool("all-version", allVersion),
		zap.Int("count", keyCount),
		zap.Float64("key-size(MB)", float64(keySize)/MB),
		zap.Float64("value-size(GB)", float64(valueSize)/GB),
	)
	return nil
}

The out put log is:

[2022/03/22 17:16:31.913 +08:00] [INFO] [debug.go:68] ["finish get total badger db size"] [size(GB)=0.5796657186001539] [all-version=false] [count=5884] [key-size(MB)=0.08347034454345703] [value-size(GB)=0.5795842045918107]
[2022/03/22 17:16:33.848 +08:00] [INFO] [debug.go:68] ["finish get total badger db size"] [size(GB)=13.658869383856654] [all-version=true] [count=264109] [key-size(MB)=3.5311460494995117] [value-size(GB)=13.65542099904269]

As you can see, there are too many old version keys doesn’t been GC deleted. The valid data size only has 0.5GB, but total version data size is 13GB.

Value Log GC Goroutine

I already has an goroutine to do value log GC

func doGCLoop(db *badger.DB, closed chan struct{}) {
	// run gc when started.
	runGC(db)

	ticker := time.NewTicker(1 * time.Minute)
	for {
		select {
		case <-ticker.C:
			runValueLogGC(db)
		case <-closed:
			return
		}
	}
}

func runValueLogGC(db *badger.DB) {
	// at most do 10 value log gc each time.
	for i := 0; i < 10; i++ {
		err := db.RunValueLogGC(0.001)
		if err != nil {
			if err == badger.ErrNoRewrite {
				log.Info("badger has no value log need gc now")
			} else {
				log.Error("badger run value log gc failed", zap.Error(err))
			}
			return
		}
		log.Info("badger run value log gc success")
	}
}

But the log display there are no value log need gc.

[2022/03/22 17:11:35.621 +08:00] [INFO] [gc.go:58] ["badger has no value log need gc now"]
[2022/03/22 17:12:35.620 +08:00] [INFO] [gc.go:58] ["badger has no value log need gc now"]
[2022/03/22 17:13:35.620 +08:00] [INFO] [gc.go:58] ["badger has no value log need gc now"]
[2022/03/22 17:14:35.619 +08:00] [INFO] [gc.go:58] ["badger has no value log need gc now"]

What Badger options were set?

opts := badger.DefaultOptions(dataPath).
		WithZSTDCompressionLevel(3).
		WithBlockSize(8 * 1024).
		WithValueThreshold(128 * 1024).
		WithLogger(l)

What did you do?

What did you expect to see?

BadgerDB consume less disk space.

What did you see instead?

BadgerDB consume too much disk space.

After I stop the program, and execute following command:

badger flatten --dir data

Then restart the program, after a while, the disk space was been released.

$ du -h data
581M    data

The related log is:

[2022/03/22 17:32:56.640 +08:00] [INFO] [debug.go:68] ["finish get total badger db size"] [size(GB)=0.5654773041605949] [all-version=false] [count=4484] [key-size(MB)=0.06477832794189453] [value-size(GB)=0.5654140440747142]
[2022/03/22 17:32:56.763 +08:00] [INFO] [debug.go:68] ["finish get total badger db size"] [size(GB)=0.5655907960608602] [all-version=true] [count=6332] [key-size(MB)=0.08948040008544922] [value-size(GB)=0.5655034128576517]
···

This is work, But How to fix this problem automatically in my program? I can not always stop the program and execute `badger flatten --dir data`.

The bug was been located, badgerengine doesn’t release disk space since badger gc doesn’t work. · Issue #454 · genjidb/genji (github.com)

1 Like