I have set up an HA Dgraph cluster(i.e 3 zeros and 3 alphas) on EKS. it is working fine but I want to add/scale-up 1 or 2 more alpha on the cluster.
What I did
I have added 4th alpha and it shows in my cluster but it actually doesn’t act as an independent node. for e.g, I have 153GB of data on three alpha nodes but when I add the 4th alpha in my cluster, data doesn’t copy on it. only 2.2GB of space filled on it and when I fired multiple queries then only three alpha nodes serve queries but 4th alpha sits ideal doesn’t take the load. no CPU and memory utilization is visible on the 4th alpha.
If any node among 3 alpha nodes goes down then only the 4th alpha comes in the picture and data starts copying on it.
please let me know how it works internally? how it servers query?
is there any way to scale up the existing 6 nodes cluster or we have to specify the replicas initially while setting up the cluster only or is there any way to add replicas in the existing cluster?
because if we add more alphas on the existing cluster it acts as sharding it doesn’t act as an independent node.
The zero has a flag --replicas that is how many servers are used in a high availability group. The default is 3. The fourth you added started a second group, serving a subset of the predicates.
I would highly suggest familiarizing yourself with the documentation on how the servers run, including this section on running with kubernetes: https://dgraph.io/docs/deploy/kubernetes/
I have passed 3 replicas while creating cluster and data has been replicated over these three servers. can we change the no of replicas in already running cluster or can we replicate our data on 4th server in EKS.
because can are able to up an extra alpha/server that serves as a subset but data doesn’t replicate on the 4th one.
is there any way?
I have not tried changing it on a running system, but I would assume the next 2 servers added will be in group 1 if you change it to 5. (4 is an illegal value since it’s RAFT groups)
However, what you actually want is the enterprise feature called ‘learner nodes’. Adding more members means they will participate in quorum when writing down data, which is great if you actually want that - but if you just want to serve queries that’s the point of that feature. But obviously that’s behind the enterprise license so may be a blocker for you.
Sounds like since you added a node in group 2 already and you don’t want it there, you will probably want to export all of your data and import on a fresh system with correct settings. In k8s it uses the statefulset ordinal as the server index and removing one from the cluster is hard compared to running on hardware.
please refer above screenshots. I have added alpha-0, alpha-1,alpha-2 have 153GB of data. this we have created while live loading after creating the Dgraph HA cluster. then we wanted to add 2 more alphas and when we create alpha only 2.2GB of space is utilized. how can we overcome this problem?
servers 0,1,2 are probably in one group while the others are in a different group. Groups do not share any data between them. This is the design of the horizontal scaling of dgraph.
If the issue you are referring to is you would like the second group to serve half of your tablets, every 8m the zeros will initiate a move of the largest tablet from the largest group to the smallest group (by disk usage)
To make one group consist of five servers, set the replicas flag to 5 on the zeros, as I said above.
even server 3 is also part of 1 group. please refer to the below screenshot. even I passed replica 5 while creating a cluster and replicated data to 3 alphas initially. once data has been replicated and cluster setup successfully then I have added 4th alpha and it becomes part of my group and it is serving query as well but data is not getting replicated on it. only 2.2 GB space is occupied out of 153 gb.