I’m considering running BadgerDB on HDDs for a use case where I only insert items, and then sometimes scan the entire database. Obviously with limited random access abilities of a HDD, these scans would be many orders of magnitude more efficient if they could walk the value log sequentially in file order.
Am I right that there isn’t currently a way to do this or did I miss something?
Could such an iterator be implemented easily in BadgerDB? I think so from what I know of the architecture, but I’m not familiar with the internals.
these scans would be many orders of magnitude more efficient if they could walk the value log sequentially in file order.
Badger stores data in sorted form. How would reading a value log file help you? Also, value log is the write ahead log file and sst is where the index and keys are stored.
I might have an incorrect mental model here, so please bear with me.
What I’m thinking is the value log contains all key-value pairs, which may also be duplicated in the SST (or just the key and a pointer into the value log). From what I understand the value log is the write-ahead log, so it must have the keys as well.
So if one has a use case where you want to scan all key-value pairs, and the values tend to be stored in the value log and not the SST, then it’s more efficient to walk the value log in file-order then to walk the keys in the SST and look up the value via random access.
The limitations are of course the data won’t be ordered by key and you’d need to handle deleted key-values manually., When you find the key-value pair you don’t know at that moment if you’ll find a delete later on in the value log. If your database is append-only then you don’t have to worry about that at all.
My use case specifically is conducting a brute force search over all items in an append-only database. Scanning the value log sequentially would be the most efficient way to do that and would allow the OS to perform aggressive read-ahead. On a spinning rust type device, the performance difference would be multiple orders of magnitude.
We should be able to wrap this function in a public API and expose it so that people can iterate the value log file in a sequential manner. This has to be a read-only operation.