How do graphs get mapped to badger?

samsquire · May 18, 2021, 3:30am

I study database internals as a hobby. I am curious how dgraph works internally. I hope you can answer my questions regarding data storage. I’ve had a quick whiz through the code but couldn’t find what I was looking for, maybe you could point me in the right direction.

I am curious how dgraph takes a graph and turns it into keyvalues for badger.

How are edges stored in a keyvalue storage?

I am guessing range queries are used to iterate over an outgoing edge.

Do you store the relationship data using uids?

Maybe store a key saying from_uuid with value to_uuid. Then if I add an index, I store the reverse connection: reverse_uuid parent_uuid

I have written a very simple graph database that uses matrix multiplication to do breadth first search. You can find it here GitHub - samsquire/hash-db: distributed keyvalue database imitating dynamodb querying with SQL support, distributed joins and Cypher graph support It’s inspired by graphblas and redis-graph

I am interested in dgraph internals because I might want to simulate the same keyvalue storage mechanism in my noddy database which isn’t for serious use, it’s just a hobby.

chewxy · May 18, 2021, 4:24am

First, read this blog post. This is part of a series of me explaining the Dgraph Paper. The final table at the end of the blog post is a conceptual overview of what a postinglist is (it’s like a reverse index).

You can find the posting list package here - it’s a bit obscured by layers of necessary complexity, but if you stand back far enough you can see it:

Here’s the definition of the data structure (in PB):

github.com

dgraph-io/dgraph/blob/master/protos/pb.proto#L366-L407


      
            repeated string types = 4;
          }
          
          // Posting messages.
          message Posting {
            reserved 6;  // This was used for label.
            fixed64 uid = 1;
            bytes value = 2;
            enum ValType {
              DEFAULT = 0;
              BINARY = 1;
              INT = 2;  // We treat it as int64.
              FLOAT = 3;
              BOOL = 4;
              DATETIME = 5;
              GEO = 6;
              UID = 7;
              PASSWORD = 8;
              STRING = 9;
              OBJECT = 10;

This file has been truncated. show original

Let me know if you have any other questions

samsquire · May 20, 2021, 7:56am

To clarify my understanding. There is no distinction between an outgoing edge to another item than there is for a simple field. Both are stored with a key of (relationship_or_field, uid) → posting list.

What’s the posting type for a relationship? Default? In the ValType list?

iluminae · May 20, 2021, 1:17pm

A relationship is a predicate with a UID or set of UIDs as a value - the type in the proto for a relationship is UID.

chewxy · May 20, 2021, 10:56pm

^ this is correct

Additionally @samsquire if you’re using python, you shouldn’t have to worry about the static types anyway (note I am using static types vs dynamic types in the sense that static types are known at compile time and dynamic types are known at runtime).

Topic		Replies	Views
Queries and Storage Questions Dgraph	2	856	November 28, 2018
How Dgraph uses Badger Dgraph techtalks	0	512	April 15, 2020
Documentation about how Badger actually stores Dgraph data on disk Badger	4	1771	August 9, 2020
Releasing BadgerDB v2.0 - Dgraph Blog Blog	3	812	February 4, 2020
Noob advancement into DGraph learning Users	2	644	November 7, 2017

How do graphs get mapped to badger?

Related topics