Data == postings lists == indexes and resource utilization

In the latest community call @mrjn mentioned that everything is stored in postings lists and that indexes and the native storage format are very similar. This seems like a huge differentiator with other systems or am I mistaken in that? I’m fascinated by this and would love more discussion of it. Perhaps there is already a blog post or discussion on it I am unaware of.

One of the key trade-offs traditionally made with a database is defining indexes which cost ram, storage and compute resources to gain faster run-time query performance. Often, building indexes and keeping them updated can add latency to updates such that changes may not be immediately reflected in query results.

So if everything in Dgraph is already stored in postings lists, which are a standard inverted index data-structure, my mind immediately jumps to all kinds of interesting questions. If that is the case then what are the indexes I define in my schema doing? Do I already get indexed like performance out of the box without defining an index? What ram, storage and compute costs can I expect with Dgraph indexes and are theses considerations meaningfully different from say an Elasticsearch or Postgres?

2 Likes

I think you gonna like to read this paper

Yes. Everything is stored in posting lists. And y’day, we pushed a big change to replace our current group varint encoding to use Roaring Bitmaps, which is being used by many latest technologies: https://roaringbitmap.org/

We’ll be working over the next quarter to optimize query performance even further from where we are today.

Further more there will be a series of blogposts which delves deep into it written by me. Coming soon

3 Likes

Wow, that white paper is exactly what I was looking for THANKS! For anyone else that is curious I took some notes mostly copy-paste, perhaps a bit long but well worth understanding.

Data is structured in triples: subject predicate object-value

Subject is a node, predicate is a relationship, and object can be another node or a primitive data type.

example

<person-a> <lives-in> <sf> .
<person-a> <eats> <sushi> .
<person-a> <eats> <indian> .
...
<person-b> <lives-in> <nyc> .
<person-b> <eats> <thai> .

UIDs

All subjects in Dgraph are assigned a globally unique id, called a uid. A uid is stored as a 64-bit unsigned integer (uint64)
A uid once allocated is never reallocated or reassigned. Thus, every node in the graph can be referenced by a unique integer.

Data is sharded on predicate

shard 1

<person-a> <lives-in> <sf> .
...
<person-b> <lives-in> <nyc> .

shard 2

<person-a> <eats> <sushi> .
<person-a> <eats> <indian> .
...
<person-b> <eats> <thai> .

Values stored in a “posting”

Values (int, float, string, datetime, geo, etc) are stored in postings. Each posting has an integer id. The data is converted into binary format and stored in a posting along with the information about the original type.

So a posting is:

integer id / uid
type info
binary format

Postings lists link values (postings) back to their subject-predicate (location in the graph)

Within a shard, records sharing the same subject-predicate are grouped and condensed into one single key-value pair in Badger. This value is referred to as a posting list, a terminology commonly used in search engines to refer to a sorted list of doc ids containing a search term. A posting list is stored as a value in Badger, with the key being derived from subject and predicate.

<0x01> <follower> <0xab> .
<0x01> <follower> <0xbc> .
<0x01> <follower> <0xcd> .
...
key = <follower, 0x01>
value = <0xab, 0xbc, 0xcd, ...>

In a common case where the predicate only has objects (and no values like follower edge), a posting list would consist largely of sorted uids. These are optimized by doing integer compression. The uids are grouped in blocks of 256 integers (configurable), where each block has a base uid and a binary blob. The blob is generated by taking a difference of current uid with the last and storing the difference in bytes encoded using group varint. This generates a data compression ratio of 10. When doing intersections, we can use these blocks to do binary searches or block jumps to avoid decoding all the blocks.

Thanks to these techniques, a single edge traversal corresponds to only a single Badger lookup.

Joins

Assuming the worst case scenario where the cluster is so big that each shard lives on a separate server. For a query which asks for [people who live in SF and eat Sushi], Dgraph would execute one network call to server containing lives- in and do a single lookup for all the people who live in SF (* <lives-in> <sf>). In the second step, it would take those results and send them over to server containing eats, do a single lookup to get all the people who eat Sushi (* <eats> <sushi>), and intersect with the previous step’s resultset to generate the final list of people from SF who eat Sushi. In a similar fashion, this result set can then be further filtered/joined, each join executing in one network call.
As we learnt in section ??, the result set is a list of sorted 64-bit unsigned integers, which make the retrieval and intersection operations very efficient.

Indexes

The difference between an index and data is the key. A data key is typically <predicate, uid>, while an index key is <predicate, token>. A token is derived from the value of the data, using an index tokenizer.

2 Likes

A couple of questions about indexing from this.

1: When a mutation happens on an indexed value do the changes to the index postings lists happen within the transaction? If not how does one know when the index is updated and when query results will reflect the new values?

2: Are values / postings stored with the same shard?

3: How are values / postings stored in badger?

4: How are full text search results ranked?

Looking forward to those new in-depth blog posts!

Yes they are colocated, I believe.

They aren’t I believe. I recommend using something like BM25 to rank them yourself. That’s what I’ve been doing anyway. (the BM25 package is mine)