Documentation about how Badger actually stores Dgraph data on disk

I am just starting to do a deep-dive into how badger is storing data. So, I am looking at the vlog files for zero and alpha and trying to make sense of it all.

I want to create an offline-first system that is designed to sync seamlessly with dgraph/badger, thus I want to represent the data in IndexedDB as conveniently and efficiently as possible.

I want to know:

  1. How does badger persist its key-value pairs?
  2. How does dgraph translate its triads into key-value pairs?
  3. What is the relationship between alpha and zero?
  4. How can I read the badger files to understand what it is storing where, and why?
    4b. This includes the issues of encoding, compression and encryption - How to disable compression and re-encode the log file to human readable utf8?

I assume that this is documented somewhere, but I didn’t find it in the level of detail I want.
I did find this basic conceptual description:
https://dgraph.io/docs/master/design-concepts/#badger

Posting Lists get stored in Badger, in a key-value format, like so:
(Predicate, Subject) --> PostingList

In Zero:

I see that the vlog has many of these three slightly different sequences:
alpha:70800֚Bz1- , alpha:7080 0֚Bz1, alpha:70800Bz1-
and they are always followed by a number

Another often repeated sequences is:
!badger!txn

In Alpha:

I see that the !badger!txn sequence is also present, but I don’t find matching sequence numbers

I can see the actual changes, eg. this is how the change of name@en from “michaellll” to “mikeee” is stored:
!badger!tx601517j@amemichaellll> \'hnamemikeee> ’hqb@!name'\> ~a,S Mikeee *enh v*!badger!txn>82881

Please Help!

In addition to replies here in discuss, I am open to direct links to commits, documentation, blog posts, or whatever you find relevant for this deep dive.

I assume that #4b is related to parsing protocol buffers:
https://developers.google.com/protocol-buffers/docs/pythontutorial#parsing-and-serialization

But i have no idea how to go about reading the vlog and/or sst files and parsing them in order to output something more human readable…

any help?

Hi @gotjoshua. You can use the dgraph debug tool to inspect the content of the p directory of Dgraph Alpha. Here’s some docs that show some example output: https://dgraph.io/docs/howto/#debug-tool-output

You can use Badger to open up a Badger DB (e.g., a Dgraph p directory) and iterate over the keys and values.

You can also check out the Dgraph paper to learn about the underlying data format.

1 Like

Thanks for the reply, the debug tool is quite helpful!

If I use a badger command line, can i access a locked folder? or i still need to shut down the alpha in order to have a look?

Also, I’m wondering if there is anyone from the core team that could take time to answer this thread:

Now (thanks to the debug tool) I can see that the infinite history is there, and I’d love to have easy access to it (without having to create additional dgraph structures for it)

You can open that directory in a read-only mode with BypassDirLock set to true in badger

It would be a good idea to just make a copy of the badger directory, delete the lock file and then perform whatever operations you want.

2 Likes