Discussion: BadgerDB should offer arbitrarily sized, atomic transactions

diggy · May 8, 2020, 12:56pm

Moved from GitHub badger/1325

Observation

The current BadgerDB implementation keeps in-flight transactions in memory until txn.Commit() is called. If the amount of operations exceeds a predetermined amount, an ErrTxnTooBig is yielded. The de-facto way of dealing with this seems to be to commit the pending transaction there and then, to create a new transaction and continue the sequence of operations.

Problem statement

This is unacceptable for many applications for two reasons:

It will leave the database in an inconsistent state; perhaps not at the physical level - but certainly at the application level.
Concurrent transactions will use up available RAM quickly and unpredictably

In my opinion, the current situation essentially breaks the Atomicity guarantee for any transaction larger than (some function of) available RAM, or combination of transactions that exceeds (some function of) available RAM.

Real-world examples

To illustrate: our use cases dictate regular updates of blocks of large amounts of keys, which includes deleting the previously existing block (currently used by the application and its users) and then inserting the new block. The number of keys is not a fixed amount. One can easily see how inconvenient it is to the users of our application if the ErrTxnTooBig errors happens right at the moment we finished deleting the previously existing keyset. It would cause the removal of the entire block that is currently being used by the application, before replacing that block with new data. This is what I mean by leaving the database in an inconsistent state at the application level.

Additionally, we can imagine another situation: many blocks in various parts of the application are written to simultaneously, as external users might. Once any combination of these actions at some time exceeds the available RAM it crashes the application. Perhaps the ErrTxnTooBig mechanism takes this into account, I’m not sure.

Discussion

I believe the author of issue #1242 was also hinting at this problem, but I think the issue goes beyond ErrTxnTooBig and a solution (if this were accepted as a problem to begin with!) could/would entail potentially far reaching changes in how BadgerDB stores in-flight transactions. Postgres could serve as a good example of how to achieve it: How Postgres Makes Transactions Atomic — brandur.org.

My understanding is that many other databases follow similar practices; for Neo4J I know this is the case for certain. I also don’t mean to imply this is a bad choice per-se by any means; perhaps BadgerDB/DGraph and others are aimed at those use cases where atomicity on arbitrarily sized transactions is not needed, where raw performance has priority and that is OK!

However, the (optional) ability to deal with arbitrarily sized transactions would make BadgerDB/DGraph a suitable solution for a wider domain of real-world applications.

I’m curious to learn:

Whether it is perceived as a problem at all
Is a solution to this problem part of the future of BadgerDB
If the above answers are both yes, how should it be approached
Is there anything I could do to help

diggy · May 14, 2020, 9:42am

jarifibrahim commented :

Hey @mvanderkroon, thank you for the detailed write-up. The Postgres article was definitely helpful. The transaction limits are set based on the max table size

github.com

dgraph-io/badger/blob/aadda9aa27fdf762cf36e1a150f796fb094fbc10/db.go#L194-L195


      
          	opt.maxBatchSize = (15 * opt.MaxTableSize) / 100
          	opt.maxBatchCount = opt.maxBatchSize / int64(skl.MaxNodeSize)

The transaction can be at most 15% of the table size. The default size of the table is 64 MB so the max batch size would be about 9.6 MB. I don’t know why we have a 15% limit. It would be worth experimenting with some other numbers (maybe 50%?). You might be able to work around the transaction size by setting a high value for max table size.

Fixing the transaction too big issue isn’t easy. Postgres stores the uncommitted transactions separately and then adds them to the main WAL/BTree. A similar change in badger would be a big change and it might take a decent amount of time to build it.

Whether it is perceived as a problem at all

I believe a lot of users might be facing this problem and we should fix this.

Is a solution to this problem part of the future of BadgerDB

Yes, we’re always working on improving badger. This seems like a useful improvement. I’ll add this to our backlog.

If the above answers are both yes, how should it be approached

I believe the first step would be to study how postgres/mysql/rocksdb deal with this problem and then a detailed write-up about how we could tackle the problem in badger will be a good starting point.

Is there anything I could do to help

Badger is open source and we encourage people to contribute. Anything you can contribute to improving badger/fixing transaction too big issue will be helpful for all the contributors

diggy · May 18, 2020, 6:08am

mvanderkroon commented :

@jarifibrahim thanks for the equally detailed response. I’m happy to learn this is something that could contribute to BadgerDB.

Thank you for the pointer towards the max table size, I looked into the mechanics and ran a few tests, I believe this would help extending the number of operations before receiving a ErrTxnTooBig (essentially allowing for a dial to regulate the transaction size where atomicity is guaranteed), but it would still be in RAM. For our use case this latter part is problematic; users may send variable sized payloads to the service and thus it’s difficult to dimension the hardware appropriately.

I am, however, working on a “poor-man’s’”-solution: essentially a layer on top of BadgerDBs’ transactional layer. By assigning ‘blocks’ of data unique ID’s (versions) and making those part of the keys, I can stream data to the database, committing whenever I hit ErrTxnTooBig. Once the process is done, I switch over the ‘active block ID’ to the latest inserted and initiate a job that deletes the previous ‘version’. Now, this is by no means a ‘good’ solution that I would present back to this repository (for one, it feels like I’m reimplementing logic that is already present in some form or another) - but for our (narrow) use case it will do fine. At the same time, this process will allow me to gain more knowledge and understanding of BadgerDB as well as an intuition for a proper way forward, I’ll circle back once I reach this point.

diggy · May 18, 2020, 8:39am

jarifibrahim commented :

@mvanderkroon WriteBatch might be useful to you badger/batch.go at ef28ef36b5923f12ffe3a1702bdfa6b479db6637 · dgraph-io/badger · GitHub It is a layer on top of transactions which allows for inserting data at a higher speed (and also takes care of ErrTxnTooBig). The only caveat is that you cannot read data using write batch but I’m guessing you’re not reading any data.

diggy · May 19, 2020, 7:14am

mvanderkroon commented :

Thank you, that looks very useful for my use case

capicardio · November 13, 2023, 11:36pm

Hey folks,

I’m affected by the same issue as the original poster. I see this was ported over from Github back in 2020. @jarifibrahim mentioned they’d add it to the backlog, but I imagine as with most nontrivial projects, you probably get new tickets and feature requests quite often, and it can be hard to get to older things like this one. Is there any chance you could please revisit this?

I found BadgerDB recently, and was very impressed by it. I’d like to use it to replace SQLite as the catalog layer for a storage system. I’ve gotten to a point where the SQL queries are a significant portion of the system’s complexity, and Badger seems to be a great solution for that.

I need to store metadata about files, some of which are streamed in and stored in fragments. I need to reference individual fragments, for deduplication purposes. There can easily be 100k fragments in a file, potentially more. Of course, I need atomicity for storing a given file. However, with the limits I’ve seen described here, it seems there’s a good chance I won’t be able to store some files in a single transaction. Which is a non-starter as the DB would be in an inconsistent state with dangling references, if the process dies at the wrong time.

I also have other operations which need atomicity and can grow quite large, e.g. deleting a “partition” of sorts. A partition would typically have 100k-1M files, possibly more. This would be much easier to implement correctly with atomicity; I could probably get away with something like db.DropPrefix for the part of it if I design the schema just right, but there will be dangling references (e.g. indexes) which need updating. Without atomicity, that again means the process might die at the wrong time, and the DB will be left inconsistent.

I could try to implement a journaling layer on top of Badger, but that really defeats the whole purpose. I’m bound to get it subtly wrong in hard to detect ways, and would basically just be implementing a DBMS on top of another DB…

Some of this might be doable if we could write a tree of keys on a separate namespace (different prefix) and then atomically move it (i.e. rename all keys with that prefix). That way, we could do things in an RCU style, as long as the atomic rename didn’t have size limitations. The ideal solution, however, would be to have arbitrarily large transactions.

Could you please revive this request?

capicardio · December 9, 2023, 12:19pm

Friendly ping, any comments please?

Topic		Replies	Views
Knowing the limit of writes in a transaction Badger	8	2225	June 15, 2022
Bulk Writes in Badger Users	6	1634	December 19, 2017
A question about Badger Transaction Badger	0	441	July 25, 2021
Concurrent ACID Transactions in Badger - Dgraph Blog Blog	0	1051	October 5, 2017
BadgerDB crashed when manual transaction commit Badger kind:bug	2	541	February 13, 2023

Discussion: BadgerDB should offer arbitrarily sized, atomic transactions

Observation

Problem statement

Real-world examples

Discussion

Related topics