Because of similarities between the two projects, I get asked this question a lot. So, I decided to make this a wiki page, for easy references and to allow multiple view points.
Note: This article is based on my response on Reddit to a similar question.
I haven’t looked too deep into Cayley, but based on my understanding, it’s a hybrid document-graph engine, sort of like a graph layer on top of an existing database. It supports multiple of them. You can use a distributed database below it, for Cayley to support distribution. This means Cayley itself doesn’t need to tackle data distribution, snapshots, machine failures etc., and can rely on the database for these features. But also means, Cayley query performance would be bound by how data gets divided by underlying database, and affected by the fan-out in terms of number of results, of intermediate steps.
For DGraph, low latency for query execution is the prime goal. In a distributed system, this largely equates to minimizing the number of network calls. For graph processing systems, doing that is really hard. If data distribution across machines is done in a standard key based sharding way, a single graph query could end up hitting a lot, if not all the machines, when the intermediate/final result set gets large.
DGraph tackles this problem by dividing up the triple data (subject S, predicate P, object O)
in a way so as to colocate all the (S, O)
for P
on the same machine (possibly further sharding it if P is too big). Furthermore, it stores the data in sorted lists (O1 … Oi)
, to allow for really cheap list intersections (think of queries like [movies starring X and Y]
).
This allows keeping the total number of network calls required to process a query, linear to the complexity of query, not the number of results. In addition, all the entities (S, O) are converted to uint64
numbers because they are a lot more efficient to work on (CPU wise) and pass around (network wise).
DGraph is aimed at squeezing great performance, so one could use this system in production, directly for user facing queries. It’s built with a very different design and ideology than Cayley.
Btw, do have a look at the product roadmap to get a better understanding of where DGraph is headed: Product Roadmap · Issue #1 · dgraph-io/dgraph · GitHub
Update: Note that a fair comparison won’t be possible without deeply understanding the internal workings of Cayley. So, take the above differences with a grain of salt. I have a lot of respect for Barak, it’s prime author and my ex-colleague at Google. In fact, I’m really happy that there’re are multiple open source Graph database projects to solve the Graph serving problem.