In this topic I want to discuss my experience with implementing a data store using Dgraph’s Badger and I’d write it as a story.
First, I’d start with explaining the use case. I have PostgreSQL table with hundreds of millions of records that is growing fast and it has fairly structured data stored as (entity_id, timestamp, jsonb data). For the new data store that I was building specifically for this type of data I have introduced a custom serialisation/marshaling for the jsonb record and multiple roaring bitmap indexes that allow for faster lookups.
The first version was implemented using MDBX (derived from LMDB) binding for Go. It worked fairly well, but the storage engine does not recover freed space and due to time constraints I have decided against looking deep into the engine to find why the mdbx.dat file grew above the size of the original PostgreSQL table.
I have decided to switch storage engine and badger seemed to be a good choice:
- Has a stable version (v3)
- Block compression support
- BoltDB-like API
I have picked Badger v3.2103.1. It took me a few hours to rewrite the internals to use the new storage engine and everything seemed to work, but when I initiated the migration of 100 mil records I started getting “Txn is too big to fit into one request”. I fixed that with more code that split the transactions in multiple places, but that was quite unfortunate as it defeats the purpose of having transactions in the first place. Fortunately for me, transactions are not a requirement. The next thing it happened the service got killed by the OOM killer. The memory spiked so much that it got killed by the kernel.
Badger was praising the separation of keys and values and my data was mostly 11-byte keys and about 64-byte data and a bit shorter keys for the roaring bitmaps. I have decided to play with the options and after hours of tuning I haven’t had much progress, but I have realised that options.WithValueThreshold(0)
helps a bit with the memory issue. However, it eventually ended with “No space left on device” error. I found out that the compaction does not work on value logs and I have to run DB.RunValueLogGC()
, explicitly. I have added a ticker and added automatic garbage collection just like the example. I believe GC was too slow and I had no luck there either, thus I have decided to use the default setting for ValueThreshold. I have experimented with rate limiting the data migration, but was still hitting issues all the way.
I ended up very frustrated and switched to another storage engine once again, but I want to share my feedback here. In summary there are some very nice things and some nasty issues:
The good parts:
- Familiar BoltDB-like API that allowed me to switch very fast.
- The block compression does work and even with custom serialisation I was getting even smaller storage size due to duplicated and similar values.
- Compaction seems to work well enough, without much configuration.
- It has good go doc documentation and the methods of badger.Options give enough details on how the settings affect the DB.
The bad parts:
- The OOM issue is a showstopper.
- The “Txn is too big to fit into one request” is a very frustrating error, that requires you to rewrite the code to manually manage your transactions, instead of using
db.Update()
. Increasing the memtable might have fixed it for me, but I was afraid it will deepen the memory issue. - The the separation of keys and values doesn’t seem to work for small values, i.e. < 1k. I think the new default ValueThreshold of 1MB is a testament to this statement.
- Value log garbage collection is something I was expecting to be part of the compaction.
- There are too many database options and it’s hard to reason how an option would affect the memory and performance. It requires a lot of experimenting and testing the values, which is a time-consuming effort.