Changing replication

paulrostorp · October 7, 2020, 12:16pm

What is the official procedure for scaling up/down an existing dgraph cluster ? As an exercise, I’m currently looking at scaling down a dgraph deployment running on kubernetes from 3 zeros and 3 alphas => 1 zero & 1 alpha.

I’ve done this in the past and found myself in a situation where the alpha keeps crashing because it cannot connect to a leader. I did not resolve the issue at the time, just scrapped the volumes and started fresh. But now, I would like some advice on the best method.
Thanks

aman-bansal · October 7, 2020, 5:50pm

Hi @paulrostorp, you can use /removeNode?id={id}&group={gid} endpoint available at dgraph zero. More can be found here.

paulrostorp · October 9, 2020, 10:22am

Hi @aman-bansal I tried this and now I’m dealing with context deadline exceeded errors on any query

paulrostorp · October 9, 2020, 12:28pm

I think it might be due to the fact that my alpha is not a leader:
/state

{
  "counter": "8981",
  "groups": {
    "1": {
      "members": {
        "1": {
          "id": "1",
          "groupId": 1,
          "addr": "dgraph-alpha-0.dgraph-alpha.starcards.svc.cluster.local:7080",
          "lastUpdate": "1601898387"
        }
      },
      "tablets": {
  [...]
   }
  },
  "zeros": {
    "1": {
      "id": "1",
      "addr": "dgraph-zero-0.dgraph-zero.starcards.svc.cluster.local:5080",
      "leader": true
    }
  },
  "maxLeaseId": "180000",
  "maxTxnTs": "2680000",
  "maxRaftId": "3",
  "removed": [
    {
      "id": "2",
      "addr": "dgraph-zero-1.dgraph-zero.starcards.svc.cluster.local:5080"
    },
    {
      "id": "3",
      "addr": "dgraph-zero-2.dgraph-zero.starcards.svc.cluster.local:5080"
    },
    {
      "id": "2",
      "groupId": 1,
      "addr": "dgraph-alpha-1.dgraph-alpha.starcards.svc.cluster.local:7080",
      "leader": true,
      "lastUpdate": "1602161263"
    },
    {
      "id": "3",
      "groupId": 1,
      "addr": "dgraph-alpha-2.dgraph-alpha.starcards.svc.cluster.local:7080",
      "lastUpdate": "1602160485"
    }
  ],
  "cid": "f8a3101b-3c25-4dc4-b76f-360437ad697f",
  "license": {
    "maxNodes": "18446744073709551615",
    "expiryTs": "1591861766"
  }
}```

aman-bansal · October 11, 2020, 11:19am

Hi @paulrostorp, you are right. The issue is because you have removed the alpha leader.
First of all, removeNode endpoint is not specifically for the downscaling dgraph cluster. It’s meant to replace the faulty node. But with few tricks, it can be used to downscale the cluster. But we have to ensure that cluster doesn’t go into an unstable state because of loss of consensus.

So when you set up 2N+1 number of nodes in a RAFT setup, the cluster can handle N failures. So removing the first node (either Leader or follower) doesn’t have any adverse effect on cluster. But removing the second node which is the leader leads to faulty quorum and thus an unstable cluster.

paulrostorp · October 11, 2020, 11:36am

@aman-bansal Is there a way to resolve the issue, or do I have to wipe the volumes and restore ? Also, shouldn’t the removeNode endpoint trigger a re-election or shouldn’t there be a way to manually trigger it ?

dmai · October 11, 2020, 4:34pm

As @aman-bansal said, /removeNode is meant to replace unhealthy nodes. That means it’s not used to remove the current leader of the group. Doing so can make the existing members stuck trying to connect if the leader was suddenly removed and there’s no longer a majority.

The process to call /removeNode only to remove followers (leaders are presumably active and healthy) for both Dgraph Zero and Dgraph Alpha groups.

Manual recovery

If your cluster is still stuck, you can wipe the volumes and restore. Otherwise, you can undergo some manual recovery steps by keeping one of the Alpha p directories.

Check /state for the maxLeaseId and maxTxnTs information (see docs about /state).
Keep a p directory around and remove other volumes.
Start the Zeros.
- Call /assign?what=uids&num=N where num is set to the value for maxLeaseId from step 1. This sets the UID lease for blank UID assignment.
- Call /assign?what=timestamps&num=N where num is set to the value for maxTxnTs from step 1. This sets the latest txn timestamp.
Copy the p directory to the respective Alpha volumes.
Start the Alphas.

This is similar to the steps for bulk loading where bulk loader outputs p directory that you can then copy to the Alpha instances (step 4).

paulrostorp · October 11, 2020, 4:39pm

Thanks @dmai, this is very useful information.

Topic		Replies	Views
Scale up/down alpha nodes in a kubernetes environment Dgraph	1	612	August 3, 2022
Dgraph upgrade procedure while running on k8s Dgraph kind:question , dgraph	0	381	November 18, 2020
Scale down cluster Dgraph cluster	4	620	April 4, 2023
Changing replicas parameter for zero dynamically Users	2	459	September 23, 2018
Issue with increasing number of replicas for Alpha in Dgraph HA cluster in kubernetes Dgraph kind:question , dgraph , kind:enhancement	6	811	May 7, 2021

Changing replication

Manual recovery

Related Topics