Greetings,
I am interested in understanding specifics of Dgraph’s internals and concepts. Apologies in advance for whichever of those questions is obvious(and should perhaps have been obvious to me), or if there are design documents that provide answers – I mean to study the codebase to figure things out eventually, but I haven’t gotten around to it yet, and all told it should be better to get that information from the folks who are actually designing and implementing it.
- Are edges represented as postings lists? “Bruce Lee”(subject) - starred(edge) - “Enter the Dragon”(object). Would there be a posting list for a key that encodes (subject, edge) => IDs of movies (i.e bruce_lee.starred => [ids of movies bruce lee was into]) ? If so, how is the edge/subject encoded into a key name? edge_type_id:subject_id:version, or something like that?
- What happens when new movies were to be added to (bruce lee, starred), or removed from that list? would Dgraph fetch the existing posting list, decode/materialise it, update it in-memory, and then encode it and persist it back? would updates be queued in memory and every so often would be persisted back? and if so - would that require doing what I described above, or does Dgraph allow for creating additional ‘sub’ postings lists and during query time multiple postings list for the same (subject, predicate) are processed?
- How are labels/attributes modelled on the KV store? e.g for a node/subject “Michael Jordan” you could have (attributes like height=value, team=value, country=value, etc). Are posting lists used for that as well?
- What happens when a posting list is too long (say millions of IDs long). Is the whole thing retrieved and then processed, or is there some scheme where parts of it can somehow be skipped/not retrieved or retrieved in chunks ( so that you can intersect one chunk at a time, with another postings list) ?
Thank you very much in advance