How Dgraph uses Badger

Badger is the storage layer for Dgraph. All queries/mutations on Dgraph are finally converted to retrieval/insertion of some key-values pairs from/into Badger.

Dgraph uses Badger for storing three types of data:

  1. pstore: All predicates/indexes/schema data( p dir)
  2. wstore: Raft logs(w/zw)
  3. tmpIndex store: Temp storage for index building(directories with dgraph_index_ prefix)

Key-value pairs for pstore:
Single RDF record => Posting
Posting list => [Posting] (explained here)

<0x1> <friend> <0x2> .
<0x1> <friend> <0x3> .

Badger key => (friend, 0x1)
Badger value => posting list[0x2, 0x3]

Structure of a key(taken from file x/keys.go):

type ParsedKey struct {
	ByteType    byte
	Attr        string
	Uid         uint64
	HasStartUid bool
	StartUid    uint64
	Term        string
	Count       uint32
	bytePrefix  byte
}

Key-value pairs for wstore:
Stores all keys required for raft state machines(more information can be found in raftwal/storage.go). All keys contain group and node id information.

  • Snapshot key
  • Hard state key
  • Entry key
  • Checkpoint key

Use of Stream Framework

  • Used for retrieving keys from whole Badger
  • Support for retrieving keys with particular prefix, choose/not choose any key.

Places where it is used in Dgraph:

  • Index rebuilding
  • Exporting of DB
  • Sending snapshot to a follower
  • Predicate movement

Use of Stream Writer

  • Used for writing data coming from multiple streams such that data is sorted inside each stream and there is no overlap between key ranges for each stream.

Places where it is used in Dgraph:

  • Bulk loader(reduce phase)
  • Writing a snapshot

Dgraph tools which are closely related to Badger

5 Likes