Difference between replication and sharding

hari71 · November 11, 2019, 1:04am

Hi,

Can you please explain the difference between replicas and shards in 3 nodes cluster? Thanks

omar · June 3, 2020, 4:06pm

Dgraph colocates data per predicate, with sharding we mean splitting data up to where it resides in smaller chunks and spread them across multiple alpha groups (this is done by the zero group) - per analogy you can think that a shard is a piece of and entire puzzle.

With replicas we mean how many copies of your data (or shards) you have, you can set this up by passing the flag --replicas to your zero(s) commands

We invite you have a look at this doc page, and read our paper downloadable here

Best,

ibrahim · June 3, 2020, 4:22pm

@hari71 Replication and Sharding (or partitioning) are common database concepts. They’re not specific to dgraph. The following image is from Building Data-Intensive Applications book which shows the same data between replicas and different data between partitions

For Dgraph, the sharding is in terms of predicates. Replica means how many copies of the same data do you want.

I highly suggest reading the Replication and Sharding chapters of the Building Data Intensive Application Book. The dgraph paper Dgraph Whitepapers: In-Depth Insights and Analysis is also a good source of information.

Topic		Replies	Views
Cluster Setup - Deploy Documentation	1	646	June 24, 2023
Node has two predicates，the type of degree is different，How to shard？ Dgraph	6	487	July 20, 2020
Dgraph Scalability Users	4	501	January 6, 2020
Idea - Shard predicate / Edge across groups Dgraph rfc	2	1073	April 20, 2021
About the partitioning Users	2	753	April 21, 2018

Difference between replication and sharding

Related topics