As Dgraph matures as a distributed system, it would need the ability to support increase or decrease the hardware resources eg. nodes or shards in a cluster as the load in the production environment changes.
Dgraph currently uses a modulo based hashing function to determine which predicate (and associated entities) reside on which shard based on total number of nodes in the cluster (assumption??). When the number of nodes in the shard change, so does the shard associated with the given predicate.
Which means on teardown or bootstrapping of a new node in the cluster the hash for all existing predicates need to be recomputed and data need to be moved around the shards accordingly, resulting in network congestion (albeit asynchronously) and possibly “downtime” before the cluster is fully operational.
What are the approaches Dgraph plan to use to minimize the internal network traffic on bootstrapping/tearing down a node.
1. The below mentioned writeup is based on my understanding and assumptions regarding Dgraph, which is likely be erroneous. Please correct them as you see fit.
2. This may be little far in the future from Dgraph’s product roadmap point of view.