Badger - dead lock after ErrTxnTooBig

What version of Go are you using (go version)?

$ go version
go version go1.19 darwin/amd64

What operating system are you using?

macOS Monterey

What version of Badger are you using?

v3

Does this issue reproduce with the latest master?

Yes

Steps to Reproduce the issue

Here is a full program to reproduce

package main

import (
	"errors"
	"fmt"
	"math/rand"
	"os"
	"time"

	"github.com/dgraph-io/badger/v3"
)

const (
	count = 1000000
)

func main() {
	e(os.RemoveAll("./badger-cache"))

	opts := badger.DefaultOptions("./badger-cache/")

	bdgLvl := badger.INFO
	opts = opts.WithLoggingLevel(bdgLvl)

	db, err := badger.Open(opts)
	e(err)

	// Some random value outside of the main loop.
	e(db.Update(func(txn *badger.Txn) error {
		if err := txn.Set([]byte("key-1"), []byte("val-1")); err != nil {
			return err
		}

		return nil
	}))

	txn := db.NewTransaction(true)

	r := rand.NewSource(time.Now().UnixNano())

	randKey := fmt.Sprintf("rand-key-%v", r.Int63())

	for i := 0; i < count; i++ {
		var err error

		err = txn.Set([]byte(fmt.Sprintf("random-%v", r.Int63())), []byte("val"))
		if errors.Is(err, badger.ErrTxnTooBig) {
			fmt.Println("restarting transaction", i)
			e(txn.Commit()) // This line deadlocks
			txn = db.NewTransaction(true)
		} else {
			e(err)
		}

		// For some reason having this will cause a deadlock on L49.
		if _, err := txn.Get([]byte("key-1")); err != nil {
			e(err)
		}

		e(db.Update(func(txn *badger.Txn) error {
			return txn.Set([]byte(randKey), []byte(fmt.Sprintf("new-val-%v", r.Int63())))
		}))

		if i%(count/10) == 0 {
			fmt.Println("finished 1/10th")
		}
	}

	e(txn.Commit())

	fmt.Println("done")
}

func e(err error) {
	if err != nil {
		panic(err)
	}
}

What Badger options were set?

What did you do?

  • Create an initial key/val pair
  • Start a long-running transaction loop that restarts the transaction once it hits ErrTxnTooBig
  • Inside the loop, do a Get to the initial key/val pair using the long running transaction
  • Inside the loop, do a Set on some other key using a different short-lived transaction

What did you expect to see?

No deadlock

What did you see instead?

A deadlock when the transaction commits after hitting ErrTxnTooBig