A global map from shard (predicate) to the IP of the instance that handles it has to be maintained. For this, we need consensus as this map has to be consistent among all the nodes in the cluster. To achieve consensus we use RAFT library from etcd (https://godoc.org/github.com/coreos/etcd/raft)
A peer list has to be maintained so that the nodes can connect to all the peers as required and this list has to be updated in all the instances when a node joins the cluster or when a node fails or leaves the cluster. We need some way to do this efficiently. Please feel free to let me know your ideas and it can be discussed here.
Currently, every node has a map from NodeID to NodeIP and when a new node joins the cluster, it gets this list from the master and establishes a two-way connection with each node in the list. But there are some problems with this approach when a node fails. So we need some modifications to it.
The code is in : https://github.com/dgraph-io/experiments/tree/testing/etcdRAFT