How do graphs get mapped to badger?

I study database internals as a hobby. I am curious how dgraph works internally. I hope you can answer my questions regarding data storage. I’ve had a quick whiz through the code but couldn’t find what I was looking for, maybe you could point me in the right direction.

I am curious how dgraph takes a graph and turns it into keyvalues for badger.

How are edges stored in a keyvalue storage?

I am guessing range queries are used to iterate over an outgoing edge.

Do you store the relationship data using uids?

Maybe store a key saying from_uuid with value to_uuid. Then if I add an index, I store the reverse connection: reverse_uuid parent_uuid

I have written a very simple graph database that uses matrix multiplication to do breadth first search. You can find it here GitHub - samsquire/hash-db: distributed keyvalue database imitating dynamodb querying with SQL support, distributed joins and Cypher graph support It’s inspired by graphblas and redis-graph

I am interested in dgraph internals because I might want to simulate the same keyvalue storage mechanism in my noddy database which isn’t for serious use, it’s just a hobby.

First, read this blog post. This is part of a series of me explaining the Dgraph Paper. The final table at the end of the blog post is a conceptual overview of what a postinglist is (it’s like a reverse index).

You can find the posting list package here - it’s a bit obscured by layers of necessary complexity, but if you stand back far enough you can see it:

Here’s the definition of the data structure (in PB):

Let me know if you have any other questions

1 Like

To clarify my understanding. There is no distinction between an outgoing edge to another item than there is for a simple field. Both are stored with a key of (relationship_or_field, uid) → posting list.

What’s the posting type for a relationship? Default? In the ValType list?

A relationship is a predicate with a UID or set of UIDs as a value - the type in the proto for a relationship is UID.

^ this is correct

Additionally @samsquire if you’re using python, you shouldn’t have to worry about the static types anyway (note I am using static types vs dynamic types in the sense that static types are known at compile time and dynamic types are known at runtime).