Knowing the limit of writes in a transaction

I have a question about dealing with ErrTxnTooBig, based off this topic:

In this previous topic, as well as the README, the suggestion is to manually use transactions, and to commit and create a new one once this limit is reached. But what is the recommended way of handling the case where each iteration of a bulk write has multiple writes, and the limit is reached part way through this iteration?

Example:

for item in myItems:
    write1()
    write2()
    write3() // ErrTxnTooBig

I would want to undo write1 and write2 before committing the transaction and starting a new one, since they all should happen atomically. But, I can’t currently know when I am going to exceed the limit until I do. I suppose this is kind of like nested transactions? What is the best way to structure a bulk update to ensure that I can atomically write the set of updates for each item?

You’d be limited by how big a single transaction can be. If you need a huge write to be written atomically, and it’s exceeding Badger’s limits, you’d have to do some external synchronization. No way around it.

Thanks for the reply. I am not sure that is what I was asking though. My concern was the split point of the max transaction size happening in the middle of some writes that absolutely need to happen atomically for each item. If there are 100 items and each item needs 3 atomic updates, I am happy to batch the first 50 and then the second 50. But I can’t know when the transaction will be too big until I hit it and I may have already Set() 2 of the 3 steps for item 51. I would need to be able to “unset” those writes so that I can start writing item 51 in a new transaction.

To be more specific to my own use case, each item I want to store needs to not only store its data, but to update index keys. It should be a whole atomic unit. Ideally I would have some way to know that I am at the max size for the transaction by checking like this:

for item in items:
    if txn size is at max:
          txn.commit and txn = new txn()
    Write1()
    Write2()
    Write3()

Here you don’t really mean “at max”, you mean “nearing max” that the next 3 writes would push it over the limit. That’s a prediction that Badger can not/does not make.

I’d suggest breaking the problem down to the smallest level of granularity. If you really want to have 3 writes go atomically at one time, then those 3 writes should form your single transaction.

If you’re concerned about the performance impact of having many relatively small transactions in the system, you could use txn.Commit with a callback, as shown here:

I’ve just tried out your suggestion, and it does work, with a bit of a performance hit. And with the added complexity of needing to track which specific ones failed, and also losing the ordered writes of the original list of items if individual writes fail but others succeed.

Taking a look at the implementation of txn.SetEntry(), it seems like a write is just appending/indexing to two data structures:

	txn.writes = append(txn.writes, fp)
	txn.pendingWrites[string(e.Key)] = e

So is there any reason a method couldn’t be provided such as txn.Unset() (or similar) which would pop the last write? If something like this existed, I could just use the simpler manual transaction approach, and just do:

for item in items:
    Write1()
    Write2()
    Write3() // ErrTxnTooBig
    Unset()
    Unset()
    txn.Commit() and txn = new txn()
    Write1()
    Write2()
    Write3()

It is also updating the pendingWrites, and if you did multiple writes to the same key, then deleting the last write key from pendingWrites during the unset operation wouldn’t be correct. That’s just one of the obvious things which would fail. Hard to say what else will.

In general, such operations add complexity, and their cons outweigh their pros.

Thanks for the info on this.

1 Like

Hi, this topic is rather old but still relevant imo. I think @justinfx already described the problem nicely and there is nothing to add. Still, after digging into badger a bit, I found that txn.go/modify could return ErrTxnTooBig at the end of the function with an upside: You would be able to ignore the error as you see fit. For instance, if you were to process a large number of items with some changes each, with each change of approximately the same size, and the transaction limit is reached in the middle of item x, then extending the transaction by at most 1/x won’t hurt as long as x is relatively large (maybe >100).?

In case I missed a hard limit on the transaction size (other than wal size = 2*table size) I see another angle: provide a function which returns the relative transaction size. Then the application could decide when to commit.

In my project I have implemented a workaround that involved (non-trivial) retries with decreasing number of items per transaction. The complexity increase is substantial and I’d love to get rid of it.

Nevermind, potential ErrConflict justify most of the complexity, my apologies. And in hindsight, the first suggestion is probably too intrusive and would require more changes elsewhere in the code. The second however (relative transaction size) is trivial to implement and does prevent ErrTxnTooBig in my project very effectively.