Documentation about how Badger actually stores Dgraph data on disk

gotjoshua · August 3, 2020, 11:49am

I am just starting to do a deep-dive into how badger is storing data. So, I am looking at the vlog files for zero and alpha and trying to make sense of it all.

I want to create an offline-first system that is designed to sync seamlessly with dgraph/badger, thus I want to represent the data in IndexedDB as conveniently and efficiently as possible.

I want to know:

How does badger persist its key-value pairs?
How does dgraph translate its triads into key-value pairs?
What is the relationship between alpha and zero?
How can I read the badger files to understand what it is storing where, and why?
4b. This includes the issues of encoding, compression and encryption - How to disable compression and re-encode the log file to human readable utf8?

I assume that this is documented somewhere, but I didn’t find it in the level of detail I want.
I did find this basic conceptual description:
https://dgraph.io/docs/master/design-concepts/#badger

Posting Lists get stored in Badger, in a key-value format, like so:
(Predicate, Subject) --> PostingList

In Zero:

I see that the vlog has many of these three slightly different sequences:
alpha:70800֚Bz1- , alpha:7080 0֚Bz1, alpha:70800Bz1-
and they are always followed by a number

Another often repeated sequences is:
!badger!txn

In Alpha:

I see that the !badger!txn sequence is also present, but I don’t find matching sequence numbers

I can see the actual changes, eg. this is how the change of name@en from “michaellll” to “mikeee” is stored:
!badger!tx601517j@amemichaellll> \'hnamemikeee> 'hqb@!name'\> ~a,S Mikeee *enh v*!badger!txn>82881

Please Help!

In addition to replies here in discuss, I am open to direct links to commits, documentation, blog posts, or whatever you find relevant for this deep dive.

gotjoshua · August 3, 2020, 8:02pm

I assume that #4b is related to parsing protocol buffers:

But i have no idea how to go about reading the vlog and/or sst files and parsing them in order to output something more human readable…

any help?

dmai · August 3, 2020, 10:18pm

Hi @gotjoshua. You can use the dgraph debug tool to inspect the content of the p directory of Dgraph Alpha. Here’s some docs that show some example output: https://dgraph.io/docs/howto/#debug-tool-output

You can use Badger to open up a Badger DB (e.g., a Dgraph p directory) and iterate over the keys and values.

You can also check out the Dgraph paper to learn about the underlying data format.

gotjoshua · August 7, 2020, 4:14pm

Thanks for the reply, the debug tool is quite helpful!

If I use a badger command line, can i access a locked folder? or i still need to shut down the alpha in order to have a look?

Also, I’m wondering if there is anyone from the core team that could take time to answer this thread:

Now (thanks to the debug tool) I can see that the infinite history is there, and I’d love to have easy access to it (without having to create additional dgraph structures for it)

ibrahim · August 9, 2020, 2:06pm

You can open that directory in a read-only mode with BypassDirLock set to true in badger

github.com

dgraph-io/badger/blob/3e067a54e5964bf7640c7d74fa92cd015f2a3fc8/options.go#L88-L91


      
          	// BypassLockGaurd will bypass the lock guard on badger. Bypassing lock
          	// guard can cause data corruption if multiple badger instances are using
          	// the same directory. Use this options with caution.
          	BypassLockGuard bool

It would be a good idea to just make a copy of the badger directory, delete the lock file and then perform whatever operations you want.

Topic		Replies	Views
Basic of dgraph Dgraph	6	1447	August 30, 2018
How Dgraph uses Badger Dgraph techtalks	0	512	April 15, 2020
How do graphs get mapped to badger? Dgraph kind:question	4	454	May 20, 2021
Using badger for multiple version store Badger	6	1538	April 9, 2019
Dgraph looses out on sequential IO since posting list sorted? Dgraph	8	521	February 15, 2020

Documentation about how Badger actually stores Dgraph data on disk

In Zero:

In Alpha:

Please Help!

Related topics