History data can't be removed

What version of Go are you using (go version)?

$ go version
go1.15.5 linux/amd64

What operating system are you using?

Linux

What version of Badger are you using?

1.6.2

Does this issue reproduce with the latest master?

yes

Steps to Reproduce the issue

I use the following code to test the badger. After performing the following operations, I observed about 200KB of data stored in the data directory. It shows that the set operation does not delete historical data. And the delete operation did not delete the data. Moreover, GC dose not work. How do I delete the historical data that has been replaced or deleted from disk?

package main

import (
	"fmt"
	"log"

	"github.com/dgraph-io/badger"
)

func main() {
	opts := badger.DefaultOptions("./data")
	db, err := badger.Open(opts)
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	for i := 0; i < 1000; i++ {
		// set
		err = db.Update(func(txn *badger.Txn) error {
			err := txn.Set([]byte("answer"), []byte(fmt.Sprintf("xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx:%d", i)))
			return err
		})
		if err != nil {
			panic(err)
		}
	}

	// get
	err = db.View(func(txn *badger.Txn) error {
		item, err := txn.Get([]byte("answer"))
		if err != nil {
			return err
		}
		val, err := item.ValueCopy(nil)
		if err != nil {
			return err
		}
		fmt.Printf("The answer is: %s\n", val)
		return nil
	})
	if err != nil {
		panic(err)
	}

	// delete
	err = db.Update(func(txn *badger.Txn) error {
		err := txn.Delete([]byte("answer"))
		return err
	})
	if err != nil {
		panic(err)
	}

	db.RunValueLogGC(0.2)
}

Hey @LowEntropyBody , due to the design of LSM tree, the data is not deleted immediately. The data is eventually deleted through compactions.

Hi @Naman,

I’m trying to get an estimate on how much disk space badger will take for my data but although my values get removed after the TTL times out the disk usage keeps growing. From what I read in your answer (but also in other topics) the data should ‘eventually’ be deleted but what does this mean? Will it happen after a few minutes, hours, days? Is there any way to control the compactions?

If you could give some clarification about the topic I would highly appreciated it!

Thanks, Eelco

Right now, there is no way to control the compactions to specifically pick some tables. Compactions run in the background and eventually clean the stale data.

Another way to clean up the stale data is to Flatten the DB.

Thanks for your response @Naman. The application I’m experimenting with is quite write heavy so from the doc it seems using Flatten may not be a good idea but I will try it anyway so see what will happen.

I guess I will need to run a durability test to see what eventually means in reality :wink: