DGraph v0.2 Release

We’re happy to announce launch of v0.2 of DGraph. This version focuses on two things:

  • mutations via GraphQL and
  • Horizontal scalability (the distributed aspect, the D in DGraph) – allowing you to shard your data, and serve it via multiple servers.

Highlight

With this release, DGraph becomes one of the rare distributed native graph databases.

Changes

There’re a few changes in how v0.2 runs. We have isolated UID (Internal Unique IDs) and XID (External IDs) posting lists in their own RocksDB directory.

The loader is now run in 2 passes. 1st pass assigns UIDs. And second pass loads the data. This avoids the need of synchronization between instances; so the loading process can be run concurrently on multiple machines, and can execute really fast, making zero network calls.

Server at instance 0 handles the UID assignment. UID assignment requires modifying multiple posting lists transactionally. These posting lists could lie on different servers, and we don’t currently support distributed transactions. So, all the UID retrieval and assignment is handled by one server (instanceIdx=0).

GraphQL had to be slightly modified to support mutations. Their current specification system ties into the functions of Javascript (or other execution systems); it does not completely specify the data (edges) that need to be added or modified etc. So, we extended their mutation system to use RDF NQuad format of specifying data. Currently, only set instruction is supported. Multiple NQuads can be batched together in one mutation. And a query can be sent along with a mutation to get the final result, within the same GraphQL query.

Mutations are by nature distributed in this version. They’ll first retrieve (or assign) UIDs for the entities specified via instance 0. Once done, they’ll send the edges to the corresponding servers, waiting for all of them to execute successfully, before running query portion or returning.

Right now, there’s no data replication, or shard movement. Those features will be handled in later versions. You must specify the IP address of every worker up front, and let them connect to each other before executing queries.

Performance

Both the loader performance and query performance has improved compared to v0.1.

Loader performance has improved several factors (10x), largely due to optimizations in how we write to RocksDB (asynchronously, rather than synchronously).

The memory usage growth while loading is also lot more controlled than v0.1, because we’re now tracking dirty posting lists separately and continuously merging them, flushing to RocksDB and evicting them from memory.