Dgraph looses out on sequential IO since posting list sorted?

I read blogs on dgraph and badger.
badger is based on WISCKEY paper, this implies two things
key => sorted, stored in LSM trees, persisted as “sst” files (SSTables)
values => append only logs, persisted as “vlog” files

And posting list means key-value pairs.
So when dgraph claims that posting list is sorted, what exactly is sorted? I assume only keys, and not values.

Or are values sorted? If yes, is this a dgraph logic implemented on top of badger?
I hope values aren’t sorted, if yes, the the whole point of sequential IO for values, as presented in WISCKEY paper is lost :frowning:

Posting list are values which stores sorted information. For example, in case of indexes, the key would be the token + predicate (and some metadata) and values are list of sorted UIDs encoded using various techniques that we use internally. For more information, you could check out https://blog.dgraph.io/post/datetime-indexes-dgraph/.

How does this translates to storage files in p and w folder?
p is for posting list and w is for write ahead logs.

p has MANIFEST, sst and vlog files. What does each of these contain?

Manifest has some metadata, sst is memtables (keys + small values) and vlog is values (also used as WAL for badger)

values in vlog are sorted? I guess not, if it is used as WAL
Which brings me back to the question,

  1. only sst memtable is sorted ?
  2. values in vlog isn’t sorted?

The list of UIDs in posting list, are sorted. since vlog is append only, this list of UIDs are going to sst?

Values in vlog are NOT sorted, just the memtables are sorted. Each value contains sequence of UIDs (or sequence of values) which is sorted (within each value, not across values). I think you are assuming that keys are UIDs which is not always the case. UIDs could be part of the key or part of the value depending upon whether it is corresponding to data (the primary index, uid + predicate -> posting list) or corresponding to indexes (rest of the secondary indexes, token + predicate -> posting list of UIDs).

1 Like

That cleared a lot of my questions, thank you so much.
I have one last question remaining, related to WAL. since vlog are append only, they are equivalent to WAL.

  1. Why have 2 different vlog files in both /p (posting list) directory and /w (write ahead log) directory.
  2. How is the sst inside /w directory generated/stored? Since sst is sorted list, this is NOT append only. Or is the sst in /w directory append only and not sorted?

Mostly for separation of concern. p directory deals with data that is stored in the database, i.e. primary and all the secondary indexes (i.e. all the posting lists) whereas w (or zw in case of zero) directory mostly stores control information, for raft logs. Both needs to be durable.