Difference between replication and sharding


Can you please explain the difference between replicas and shards in 3 nodes cluster? Thanks

Hi @hari71,

Dgraph colocates data per predicate, with sharding we mean splitting data up to where it resides in smaller chunks and spread them across multiple alpha groups (this is done by the zero group) - per analogy you can think that a shard is a piece of and entire puzzle.

With replicas we mean how many copies of your data (or shards) you have, you can set this up by passing the flag --replicas to your zero(s) commands

We invite you have a look at this doc page, and read our paper downloadable here


@hari71 Replication and Sharding (or partitioning) are common database concepts. They’re not specific to dgraph. The following image is from Building Data-Intensive Applications book which shows the same data between replicas and different data between partitions

For Dgraph, the sharding is in terms of predicates. Replica means how many copies of the same data do you want.

I highly suggest reading the Replication and Sharding chapters of the Building Data Intensive Application Book. The dgraph paper Graph Database White Paper | Dgraph is also a good source of information.