Request option to write multiple versions in a single batch

Currently, if I make multiple key-value pairs (that share the same key) within a single write batch, only one gets written.

Obviously, this would only be relevant for databases which allow multiple versions.

The only way I see to do it now, is to somehow check if a previous key (within the same batch write) matches one I am about to add, then flush before I add it.

This seems like a needless performance hit.

db, err = badger.Open(badger.DefaultOptions(path).WithNumVersionsToKeep(math.MaxInt64))
wb := db.NewWriteBatch()
wb.Set([]byte("k1"),[]byte("v1"))
wb.Set([]byte("k1"),[]byte("v2"))
wb.Flush()
//result only one key-value pair stored

vs

db, err = badger.Open(badger.DefaultOptions(path).WithNumVersionsToKeep(math.MaxInt64))
wb := db.NewWriteBatch()
wb.Set([]byte("k1"),[]byte("v1"))
wb.Flush()
wb.Set([]byte("k1"),[]byte("v2"))
wb.Flush()
//result both versions stored
//but probably a big performance hit keeping track of when I need to call flush

I would like an option that causes the first code block above to store both versions.

Hi @songmelted

I was wondering if you could give a Big Picture idea of what you are trying to achieve?

Hi @chewxy

I have generators which produce numeric values. I have different classes of generators. Each generator has a series of input parameters.

I am producing large volumes of generator+input vs output pairs.

Outputs from a single generator are not guaranteed to be unique.

I am interested in matching outputs from one generator to another. Then analyzing which inputs lead to identical outputs from another generator.

The output values are the keys (unfortunate nomenclature). The input parameters are the values.

I then search for matching outputs (keys) and record the inputs which generated them.

Frequently, a single generator will produce the same output for different inputs within a single write batch. I need to capture each of these as they are unique and meaningful to me.

So, I was able to find a temporary work around here:

Relevant quote below

open the database as a managed DB…specify the version timestamp for each key and then my values will be added correctly

HOWEVER

My transactions are coming in so fast that time.Now().UnixNano() returns the same value for multiple entries. Because time.Now().UnixNano() is not designed to return the current nanosecond. On my hardware, it can take 10ms before it updates. This causes the undesired behavior of dropping versions.

To work around that I keep an incremental counter that I add to the result of time.Now().UnixNano() to guarantee uniqueness.

All of this seems a bit hacky. I would like to see an option that causes this behavior. Additionally, I am unsure if this hacky approach could have potentially unwanted side effects.

To me, it seems like if there is a ‘keep multiple versions’ option, there should be a related option to ‘keep multiple versions within a single transaction’. It seems reasonable to me that if someone wants to keep multiple versions that they would want to do so even within the same transaction.