Question: read-your-writes consistency

Moved from GitHub ristretto/102

Posted by maciej:

Hi!

Do you have any plans to add operations having read-your-writes consistency at any point in the future?

Thanks!

manishrjain commented :

By read-your-writes, you mean read ALL your writes? Ristretto has admission policy so it would drop Sets, that’s part of the design. However, if a key is already present in the cache, then any Sets for that key would definitely get the updated value.

maciej commented :

@manishrjain by read-your-writes I mean:

cache.Set("key", "value", 1) // set a value
value, found := cache.Get("key")
// expecting found to be true

without sleeping as in the example here: https://blog.dgraph.io/post/introducing-ristretto-high-perf-go-cache/

negbie commented :

+1
Currently you cannot use ristretto for deduplication with very high insert rate because set’s might take a few milliseconds.

martinmr commented :

I looked into this. Our plan was do something similar to what Caffeine is doing to get immediate writes. Caffeine uses a small LRU whose items are either accepted or rejected into the main LRU with the use of the TinyLFU policy.

However, for this to work correctly, the write and read buffers should be lossless. This is not the case currently because we are using sync.Pool to take advantage of its internal thread-local storage. So this feature has been de-prioritized until we find an optimal way to go about this.

Most of the limitations are due to the lack of some of the features that Caffeine uses in golang.

ben-manes commented :

Read buffer should be lossy, but write buffer should not be.

I believe the difference is that Caffeine writes to the HashMap immediately, then into the buffer, replays, and evicts. This means that the cache can temporarily exceed its capacity by a small margin. The bounded write buffer adds backpressure to ensure this does not have a runaway effect.

If I understand correct, Risretto writes into a channel first and then into the map later, which means you lose visibility. I think that was to improve write performance because Golang lacks a good, concurrent map, so they took the write buffer idea as a means to avoid the coarse locking. Regardless, I think this flow should be flipped to always have a consistent map view for more obvious code / usage. If there are performance concerns those should be tackled separately.

This came up for us as well. Specifically, because the cache is eventually consistent even when values will be successfully cached, it makes it difficult to write simple unit tests for the code around our ristretto cache. We aren’t able to load a value and then immediately assert if it’s been properly cached.

The performance motivation for asynchronous sets in the cache is important in the production, but when testing, it would be really nice to have a cache with strong consistency.

How would the authors feel about enabling the option for strong consistency? Here are two potential ways to do this:

  1. Add an OnProcessed option to the config (with the same or similar type as OnEvict: func(hashes [2]uint64, value interface{}, cost int64)). This would be similar to the OnEvict option where if it’s not supplied, there are no negative performance implications, but if it is supplied, it enables code to set a value into the cache and then block until it has been processed.

  2. Add an optional parameter to pass in a function into the Set function and then call that function when the item is processed. e.g. something like this.

type CacheValue struct {
   Hashes [2]uint64
   Value interface{}
   Cost int64
   OnProcessed func()
}

cache.SetValue(value)

This could be used instead of cache.Set(...) for use cases where strong consistency is important.