I am just starting to do a deep-dive into how badger is storing data. So, I am looking at the vlog files for zero and alpha and trying to make sense of it all.
I want to create an offline-first system that is designed to sync seamlessly with dgraph/badger, thus I want to represent the data in IndexedDB as conveniently and efficiently as possible.
I want to know:
- How does badger persist its key-value pairs?
- How does dgraph translate its triads into key-value pairs?
- What is the relationship between alpha and zero?
- How can I read the badger files to understand what it is storing where, and why?
4b. This includes the issues of encoding, compression and encryption - How to disable compression and re-encode the log file to human readable utf8?
I assume that this is documented somewhere, but I didn’t find it in the level of detail I want.
I did find this basic conceptual description:
https://dgraph.io/docs/master/design-concepts/#badger
Posting Lists get stored in Badger, in a key-value format, like so:
(Predicate, Subject) --> PostingList
In Zero:
I see that the vlog has many of these three slightly different sequences:
alpha:70800ÖšBz1- , alpha:7080 0ÖšBz1, alpha:70800Bz1-
and they are always followed by a number
Another often repeated sequences is:
!badger!txn
In Alpha:
I see that the !badger!txn
sequence is also present, but I don’t find matching sequence numbers
I can see the actual changes, eg. this is how the change of name@en from “michaellll” to “mikeee” is stored:
!badger!tx601517j@amemichaellll> \'
hnamemikeee> 'hqb@!name'\> ~a,S Mikeee *enh v*!badger!txn>82881
Please Help!
In addition to replies here in discuss, I am open to direct links to commits, documentation, blog posts, or whatever you find relevant for this deep dive.