Performance issue in prefix iteration

Vishal_Jangid · March 17, 2022, 10:32am

In badgerDB, we have billions of keys of type string and values of type HashValueList. In our use case length of HashValueList might be in millions. We have to encode key and value in []byte before inserting into BadgerDb; we are using the encoding/gob package. So, whenever we need values, we have to decode them again. This decoding process is causing overhead in our case.

type HashValue struct {
    Row_id  string
    Address string
}

type HashValueList []HashValue

To mitigate decoding overhead, we change our design to prefix iteration. With prefix iteration, we stored each of the values from our collection as a distinct Badger KV pair, rather than a single key with a large value. The prefix of the key would be the original hash value key. we’d then need to add a suffix to provide uniqueness over the collection of values from the original collection.
So in your original scheme have something like:

k1 -> [v1, v2, v3, ..., vn]
...
km -> [w1, ..., wm]

Now have something like:

k1@1 -> v1
k1@2 -> v2
k1@3 -> v2
...
k1@n -> vn
...
km@1 -> w1
...
km@m -> wm

To find the values from DB, we have n goroutines that are reading the KeyChan channel and writing values to ValChan.

func Get(db *badger.DB, KeyChan <-chan string, ValChan chan []byte) {
	
	var val []byte

	for key := range KeyChan {
	
		txn := db.NewTransaction(false)

		opts := badger.DefaultIteratorOptions
		opts.Prefix = []byte(key)

		it := txn.NewIterator(opts)
	
		prefix := []byte(key)

		for it.Rewind(); it.ValidForPrefix(prefix); it.Next() {

			item := it.Item()
			val, err := item.ValueCopy(val[:])
			ValChan <- val

			item = nil
			if err != nil {
				fmt.Println(err)
			}
		}
		it.Close()
		txn.Discard()
	}
}

In Prefix iteration, Get func to gets very slow after a while. We collected a 5-second execution trace, and the results are below:

Here github.com/dgraph-io/badger/v3.(*Iterator).fill.func1 N=2033535 Creating huge number of Goroutine internally.

The significant time spent in Scheduler Wait.

How do we improve the prefix iteration performance? Is there any better approach for our use case?

Thanks

Topic		Replies	Views
What are the best possible Options for my use case Badger	0	595	February 18, 2022
Figuring optimal options for usage pattern App Development badger	2	619	January 12, 2021
Badger example with prefix? Badger	3	2703	February 27, 2019
Optimal way to iterate over few set of keys Badger kind:question	6	941	October 3, 2021
Length of Keys - Performance Concerns? Badger	7	702	July 9, 2020

Performance issue in prefix iteration

Related topics