I study database internals as a hobby. I am curious how dgraph works internally. I hope you can answer my questions regarding data storage. I’ve had a quick whiz through the code but couldn’t find what I was looking for, maybe you could point me in the right direction.
I am curious how dgraph takes a graph and turns it into keyvalues for badger.
How are edges stored in a keyvalue storage?
I am guessing range queries are used to iterate over an outgoing edge.
Do you store the relationship data using uids?
Maybe store a key saying from_uuid with value to_uuid. Then if I add an index, I store the reverse connection: reverse_uuid parent_uuid
I am interested in dgraph internals because I might want to simulate the same keyvalue storage mechanism in my noddy database which isn’t for serious use, it’s just a hobby.
First, read this blog post. This is part of a series of me explaining the Dgraph Paper. The final table at the end of the blog post is a conceptual overview of what a postinglist is (it’s like a reverse index).
You can find the posting list package here - it’s a bit obscured by layers of necessary complexity, but if you stand back far enough you can see it:
Here’s the definition of the data structure (in PB):
To clarify my understanding. There is no distinction between an outgoing edge to another item than there is for a simple field. Both are stored with a key of (relationship_or_field, uid) → posting list.
What’s the posting type for a relationship? Default? In the ValType list?
Additionally @samsquire if you’re using python, you shouldn’t have to worry about the static types anyway (note I am using static types vs dynamic types in the sense that static types are known at compile time and dynamic types are known at runtime).