Badger keeping n+1 versions when using WithNumVersionsToKeep(n)

Hi there, I’ve been exploring the WithNumVersionsToKeep() alongside badger’s stream api and the behavior Im seeing is that for any WithNumVersionsToKeep(n) where n > 1, badger keeps n + 1.

Badger v2.2007.2

func main() {
	opts := badger.DefaultOptions("/tmp/badger").WithNumVersionsToKeep(2)
	db, err := badger.Open(opts)
	check(err)

	// Update the same key 10 times
	key := []byte("counter")
	for i := 0; i < 10; i++ {
		err = db.Update(func(txn *badger.Txn) error {
			return txn.Set(key, []byte(strconv.FormatInt(int64(i), 10)))
		})
		check(err)
	}

	// close the db to flush everything, and then re-open
	db.Close()
	db, err = badger.Open(opts)
	check(err)

	// Stream db, expect to see # items == WithNumVersionsToKeep, 
	// but for any number greater than 1, you see WithNumVersionsToKeep+1.
	// In this case we see 3 instead of 2
	stream: = db.NewStream()
	stream.Send = func(list *pb.KVList) error {
		fmt.Println(proto.MarshalTextString(list))
		return nil
	}
	err = stream.Orchestrate(context.Background())
	check(err)

}

Is this a valid test? Please let me know if I am misunderstanding the api :slight_smile:

2 Likes

Hey @seanlaff, thanks for reaching out.
When you flush a DB, all the keys which are at or below discardTs can be discarded from the LSM tree (see here). When badger is run in unmanaged mode, this discardTs is governed by read timestamp (readTs). Each transaction calls txn.Discard() and hence update the read marker, so that any key-value that can be discarded(deleted or expired) and is below the read marker can be removed from LSM tree.

In your case, the entry for i=9 sets the discardTs to 9, and the corresponding key with version 10 does not get counted here (as it is higher than discardTs(9)).

Simply adding this before closing DB should help to get the correct number of entries.

txn := db.NewTransaction(true)
txn.Discard()

Makes sense, thanks for the explanation :slight_smile:

1 Like