Discussion: BadgerDB should offer arbitrarily sized, atomic transactions

Moved from GitHub badger/1325

Posted by mvanderkroon:

Observation

The current BadgerDB implementation keeps in-flight transactions in memory until txn.Commit() is called. If the amount of operations exceeds a predetermined amount, an ErrTxnTooBig is yielded. The de-facto way of dealing with this seems to be to commit the pending transaction there and then, to create a new transaction and continue the sequence of operations.

Problem statement

This is unacceptable for many applications for two reasons:

  1. It will leave the database in an inconsistent state; perhaps not at the physical level - but certainly at the application level.
  2. Concurrent transactions will use up available RAM quickly and unpredictably

In my opinion, the current situation essentially breaks the Atomicity guarantee for any transaction larger than (some function of) available RAM, or combination of transactions that exceeds (some function of) available RAM.

Real-world examples

To illustrate: our use cases dictate regular updates of blocks of large amounts of keys, which includes deleting the previously existing block (currently used by the application and its users) and then inserting the new block. The number of keys is not a fixed amount. One can easily see how inconvenient it is to the users of our application if the ErrTxnTooBig errors happens right at the moment we finished deleting the previously existing keyset. It would cause the removal of the entire block that is currently being used by the application, before replacing that block with new data. This is what I mean by leaving the database in an inconsistent state at the application level.

Additionally, we can imagine another situation: many blocks in various parts of the application are written to simultaneously, as external users might. Once any combination of these actions at some time exceeds the available RAM it crashes the application. Perhaps the ErrTxnTooBig mechanism takes this into account, I’m not sure.

Discussion

I believe the author of issue #1242 was also hinting at this problem, but I think the issue goes beyond ErrTxnTooBig and a solution (if this were accepted as a problem to begin with!) could/would entail potentially far reaching changes in how BadgerDB stores in-flight transactions. Postgres could serve as a good example of how to achieve it: How Postgres Makes Transactions Atomic — brandur.org.

My understanding is that many other databases follow similar practices; for Neo4J I know this is the case for certain. I also don’t mean to imply this is a bad choice per-se by any means; perhaps BadgerDB/DGraph and others are aimed at those use cases where atomicity on arbitrarily sized transactions is not needed, where raw performance has priority and that is OK!

However, the (optional) ability to deal with arbitrarily sized transactions would make BadgerDB/DGraph a suitable solution for a wider domain of real-world applications.

I’m curious to learn:

  • Whether it is perceived as a problem at all
  • Is a solution to this problem part of the future of BadgerDB
  • If the above answers are both yes, how should it be approached
  • Is there anything I could do to help

jarifibrahim commented :

Hey @mvanderkroon, thank you for the detailed write-up. The Postgres article was definitely helpful. The transaction limits are set based on the max table size

The transaction can be at most 15% of the table size. The default size of the table is 64 MB so the max batch size would be about 9.6 MB. I don’t know why we have a 15% limit. It would be worth experimenting with some other numbers (maybe 50%?). You might be able to work around the transaction size by setting a high value for max table size.

Fixing the transaction too big issue isn’t easy. Postgres stores the uncommitted transactions separately and then adds them to the main WAL/BTree. A similar change in badger would be a big change and it might take a decent amount of time to build it.

Whether it is perceived as a problem at all

I believe a lot of users might be facing this problem and we should fix this.

Is a solution to this problem part of the future of BadgerDB

Yes, we’re always working on improving badger. This seems like a useful improvement. I’ll add this to our backlog.

If the above answers are both yes, how should it be approached

I believe the first step would be to study how postgres/mysql/rocksdb deal with this problem and then a detailed write-up about how we could tackle the problem in badger will be a good starting point.

Is there anything I could do to help

Badger is open source and we encourage people to contribute. Anything you can contribute to improving badger/fixing transaction too big issue will be helpful for all the contributors :slight_smile:

mvanderkroon commented :

@jarifibrahim thanks for the equally detailed response. I’m happy to learn this is something that could contribute to BadgerDB.

Thank you for the pointer towards the max table size, I looked into the mechanics and ran a few tests, I believe this would help extending the number of operations before receiving a ErrTxnTooBig (essentially allowing for a dial to regulate the transaction size where atomicity is guaranteed), but it would still be in RAM. For our use case this latter part is problematic; users may send variable sized payloads to the service and thus it’s difficult to dimension the hardware appropriately.

I am, however, working on a “poor-man’s’”-solution: essentially a layer on top of BadgerDBs’ transactional layer. By assigning ‘blocks’ of data unique ID’s (versions) and making those part of the keys, I can stream data to the database, committing whenever I hit ErrTxnTooBig. Once the process is done, I switch over the ‘active block ID’ to the latest inserted and initiate a job that deletes the previous ‘version’. Now, this is by no means a ‘good’ solution that I would present back to this repository (for one, it feels like I’m reimplementing logic that is already present in some form or another) - but for our (narrow) use case it will do fine. At the same time, this process will allow me to gain more knowledge and understanding of BadgerDB as well as an intuition for a proper way forward, I’ll circle back once I reach this point.

jarifibrahim commented :

@mvanderkroon WriteBatch might be useful to you badger/batch.go at ef28ef36b5923f12ffe3a1702bdfa6b479db6637 · dgraph-io/badger · GitHub It is a layer on top of transactions which allows for inserting data at a higher speed (and also takes care of ErrTxnTooBig). The only caveat is that you cannot read data using write batch but I’m guessing you’re not reading any data.

mvanderkroon commented :

Thank you, that looks very useful for my use case