Using Badger to manage an LFU/LRU dataset

Hi. I’m using Badger to manage a distributed cache and it’s performing very well for us. The dataset is relatively small (<1TB), but we now need to cap its size for various reasons. The elements in the database expire fairly quickly - after a few weeks they are never accessed again but the elements that are live get hit a lot - 1000’s of times per second. Badger’s in-memory cache works very well for this.
I could implement a reaper - for each key add a second key indicating the last time accessed, and update that time probablistically (see the 1000’s of times per second), and reap the oldest values regulary to keep the database size within our limit.
Does this align with how to use Badger for a case like this? Anyone have a better suggestion?
Thanks,
Paul

Do you think TTL might be useful here? If not, then yes, you could iterate over the keys and pick some of the oldest ones to delete. You could even use Stream framework to quickly pick up some keys to delete. Stream would create multiple iterators and call ChooseKey and KeyToList. You could use the item version in ChooseKey to determine if you want to keep or delete the key. Generate a list of keys that you need to delete, and cancel the stream after you have enough keys to delete in one go. And run those deletions.

Badger uses LSM tree, so your deletions would translate into compactions, which would eventually decrease disk usage.

Also, you could disable value logs by setting a high value log threshold, so your write amplification and disk usage is controlled.

1 Like

Hi Manish. TTL is unlikely to work for our case as we have very high variance on object life and no real foresight on the subject.
Stream seems like the right tool to implement this - thanks for the pointer.
Do I read correctly that the (approximate) last access time of a key is not user-accessible?

Thanks,
Paul

We don’t store last access time of a key.

1 Like