Sparse adjacency matrices instead of posting lists?


Based on some reading and going through the code, looks like Dgraph uses posting lists/adjacency lists. It appears this vocabulary comes from Facebook?

We’re interested in doing some research to see if Dgraph would benefit from sparse adjacency matrices, like RedisGraph. Has anyone tried this out, even has a proof-of-concept? Is this even worth trying?

Any ideas appreciated, thanks!

(Manish R Jain) #2

I don’t know too much about the storage concepts behind matrices, but here’s my take, so correct me if I’m wrong.

Redis Graph has the design benefit of completely being served out of memory. Dgraph being a DB cannot make that assumption. From my understanding, adjacency matrices would need to be completely stored in memory to be served. Also, updates to that, would require regeneration of the entire matrix.

One more thing: Dgraph also supports facets, which get incorporated into the posting lists. Matrices don’t allow for that.


Thank you ,Manish, for the quick reply!

I think you’re right: RedisGraph stores everything in-memory.

However, I think there’s some still opportunity for research. I found a variety of papers/code, in R and C++, for efficient computation of matrices on external storage:

Perhaps random access could be enabled with a good on-disk matrix format and roaring bitmaps ? (Roaring bitmaps support efficient random access.)

Our long-term goal is to enable integration of GraphBLAS, which would GPU-accelerated operations in dgraph.

Could you point us to where in the code we should focus on? We’re looking at dgraph/algo/, dgraph/posting/.

Perhaps facets could be represented as “special” edges?

P.S. I’d post more URLs but the forum doesn’t allow new users to post >2 links, heh.

(Manish R Jain) #4

Many packages are involved with running queries. Posting, Worker and Query packages are the main ones.