Using Badger to manage an LFU/LRU dataset

Paul_Lalonde · April 27, 2021, 4:24pm

Hi. I’m using Badger to manage a distributed cache and it’s performing very well for us. The dataset is relatively small (<1TB), but we now need to cap its size for various reasons. The elements in the database expire fairly quickly - after a few weeks they are never accessed again but the elements that are live get hit a lot - 1000’s of times per second. Badger’s in-memory cache works very well for this.
I could implement a reaper - for each key add a second key indicating the last time accessed, and update that time probablistically (see the 1000’s of times per second), and reap the oldest values regulary to keep the database size within our limit.
Does this align with how to use Badger for a case like this? Anyone have a better suggestion?
Thanks,
Paul

mrjn · April 27, 2021, 9:16pm

Do you think TTL might be useful here? If not, then yes, you could iterate over the keys and pick some of the oldest ones to delete. You could even use Stream framework to quickly pick up some keys to delete. Stream would create multiple iterators and call ChooseKey and KeyToList. You could use the item version in ChooseKey to determine if you want to keep or delete the key. Generate a list of keys that you need to delete, and cancel the stream after you have enough keys to delete in one go. And run those deletions.

Badger uses LSM tree, so your deletions would translate into compactions, which would eventually decrease disk usage.

Also, you could disable value logs by setting a high value log threshold, so your write amplification and disk usage is controlled.

Paul_Lalonde · April 28, 2021, 5:10pm

Hi Manish. TTL is unlikely to work for our case as we have very high variance on object life and no real foresight on the subject.
Stream seems like the right tool to implement this - thanks for the pointer.
Do I read correctly that the (approximate) last access time of a key is not user-accessible?

Thanks,
Paul

mrjn · April 28, 2021, 7:50pm

We don’t store last access time of a key.

Topic		Replies	Views
Best practices for serving immutable databases Badger minutes	1	840	February 26, 2020
Looking fo ValueLog GC Insight for Interesting Badger Use Case Badger	6	1262	May 14, 2020
Speeding up insert-or-update Badger	7	1006	October 18, 2022
The algorithm behind the delete Badger	0	411	June 2, 2021
Dgraph-io/badger is constantly growing (setting data with ttl) Badger	1	951	March 23, 2019

Using Badger to manage an LFU/LRU dataset

Related topics