Allow external storage implementations

Moved from GitHub ristretto/151

Posted by MarcErdmann:

It would be great to use Ristretto with external storage backends. If Ristretto would export all necessary interfaces one can implement adapters. With e.g. in-memory Badger as a storage backend this will allow for transactions.

jarifibrahim commented :

Hey @MarcErdmann! I would like to better understand how would an external storage help. Let’s say ristretto was supporting badger as storage backend with transactions, would you use ristretto as follow

cache.starttxn()
cache.get/set
cache.commit()

How would this be useful for anyone using ristretto?

MarcErdmann commented :

Hi @jarifibrahim ! Exactly, an API similar to that would be helpful.

The problem I am facing (and surely others are facing the problem as well) is that I need to index data and have the index in sync with my primary database. Therefore it would be great to allow indexing data in ristretto during a primary DB transaction and if the primary DB transaction fails also abort the ristretto transaction.

In my specific case the index itself is not a simple query cache, as this is handled by some databases like badger internally already. Instead I need to index computed values. Think of scenarios as custom indices on top of primary DBs or precomputing values for risk / fraud / etc. scores where multiple reads and writes during a transaction occur and the values indexed are only valid if the primary DB transaction succeeds.

If you have any more questions regarding the use case I am happy to go into more detail.

robcowart commented :

I can also see external storage as a useful feature. However my use-case isn’t transactions. It is related to allowing multiple instances of a load balanced application to share cached values.

Consider two instances of an app, A & B.

  • Instance A receives a work unit for which it must execute an expensive task to determine a required value (V). It caches V in its local ristretto cache so that it must not be redetermined if needed again.
  • Instance B then receives a work unit for which it also needs V. Since it isn’t in its local ristretto cache and there is no mechanism to get the value from A, it must also perform the expensive task to determine V.

In this scenario a multi-layer cache, which includes an external storage layer would be useful. The parallel here is CPU caching, which includes L1, L2 and L3 caches. Each layer sacrifices some performance for greater durability and capacity.

Returning to the example above, a ristretto “L2” cache which resides on external storage (e.g. Redis) would help.

  • Instance A receives a work unit for which it must execute an expensive task to determine a related value (V). It caches V in its local ristretto cache so that it must not be redetermined if needed again. In the background the ristretto cache also writes V to an external Redis instance.
  • Instance B then receives a work unit for which it also needs V. It first checks its local ristretto cache. Ristretto doesn’t have it in memory, so it issues a query to its “L2” (the external Redis instance) where it is able to successfully fetch V, and avoids the expensive task for determining V.