Drop/Delete Range of Keys

diggy · July 7, 2019, 4:18pm

Moved from GitHub badger/922

Posted by 256dpi:

TL;DR

It would be wonderful if badger would support some sort of optimized way to handle deletion of a range of keys. Inspiration can be taken from other projects like RocksDB which implements a special range delete tombstone: DeleteRange: A New Native RocksDB Operation | RocksDB.

Use Case

I’m currently working on quasar which builds on badger to implement a persistent queue. There is a small runnable example here: quasar/example/main.go at master · 256dpi/quasar · GitHub. The whole thing works pretty well and in a basic setup with one producer and one consumer the steady throughput is about 100’000 messages/second on my computer. I think with badger not being necessarily optimized for that workload it’s pretty neat!

Since the queue has a fixed size, there is a “cleaner” process that periodically deletes processed messages to free space in the queue. This process iterates over the keys and removes them in using one or two transactions: quasar/ledger.go at master · 256dpi/quasar · GitHub.

Unfortunately, this approach is very expensive and is responsible for up to 30% of the CPU time used by the program:

The flame graph incorrectly attributes all badger.(*Iterator).prefetch calls to the initial badger.(*Iterator).Seek call. The normal graph does not make that connection and I assume they also come from the subsequent badger.(*Iterator).Next() calls.

With the support of something similar to range delete tombstones in RocksDB this would be possibly a lot faster. Also I would favor an approach that is not externally as for example db.DropPrefix is now. In my use case writes and deletes can happen at the same time and thus I would not want to block the database for cleanings.

I imagine an interface like the following:

func (txn *Txn) DeleteRange(from, to []byte) error {}

The same functionality could potentially also be used to delete based on a prefix:

func (txn *Txn) DeletePrefix(prefix []byte) error {
  return txn.DeleteRange(prefixStart(prefix), prefixEnd(prefix))
}

I didn’t to any research yet on how this could be implemented in badger. Maybe somebody from the team has already thought about this. Anyway, thanks for considering this!

diggy · August 13, 2019, 11:11am

vgtom commented :

Which parts of the code would i need to look to get started on this??

diggy · February 5, 2020, 8:26pm

stale commented :

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

diggy · April 20, 2020, 3:49pm

jarifibrahim commented :

This looks like a useful improvement.

Topic		Replies	Views
Deleted keys and key only iteration Badger	1	901	August 11, 2018
Some good strategies to implement Issues badger , kind:enhancement , status:accepted , area:performance , skip:stale	7	909	September 17, 2019
Delete documents does not work (always) Badger	1	1371	April 23, 2019
Iterator perfomance degradation Badger kind:question	4	666	October 13, 2020
Using Badger to manage an LFU/LRU dataset Badger	3	704	April 28, 2021

Drop/Delete Range of Keys

Related topics