Changing replication

What is the official procedure for scaling up/down an existing dgraph cluster ? As an exercise, I’m currently looking at scaling down a dgraph deployment running on kubernetes from 3 zeros and 3 alphas => 1 zero & 1 alpha.

I’ve done this in the past and found myself in a situation where the alpha keeps crashing because it cannot connect to a leader. I did not resolve the issue at the time, just scrapped the volumes and started fresh. But now, I would like some advice on the best method.
Thanks

Hi @paulrostorp, you can use /removeNode?id={id}&group={gid} endpoint available at dgraph zero. More can be found here.

Hi @aman-bansal I tried this and now I’m dealing with context deadline exceeded errors on any query

I think it might be due to the fact that my alpha is not a leader:
/state

{
  "counter": "8981",
  "groups": {
    "1": {
      "members": {
        "1": {
          "id": "1",
          "groupId": 1,
          "addr": "dgraph-alpha-0.dgraph-alpha.starcards.svc.cluster.local:7080",
          "lastUpdate": "1601898387"
        }
      },
      "tablets": {
  [...]
   }
  },
  "zeros": {
    "1": {
      "id": "1",
      "addr": "dgraph-zero-0.dgraph-zero.starcards.svc.cluster.local:5080",
      "leader": true
    }
  },
  "maxLeaseId": "180000",
  "maxTxnTs": "2680000",
  "maxRaftId": "3",
  "removed": [
    {
      "id": "2",
      "addr": "dgraph-zero-1.dgraph-zero.starcards.svc.cluster.local:5080"
    },
    {
      "id": "3",
      "addr": "dgraph-zero-2.dgraph-zero.starcards.svc.cluster.local:5080"
    },
    {
      "id": "2",
      "groupId": 1,
      "addr": "dgraph-alpha-1.dgraph-alpha.starcards.svc.cluster.local:7080",
      "leader": true,
      "lastUpdate": "1602161263"
    },
    {
      "id": "3",
      "groupId": 1,
      "addr": "dgraph-alpha-2.dgraph-alpha.starcards.svc.cluster.local:7080",
      "lastUpdate": "1602160485"
    }
  ],
  "cid": "f8a3101b-3c25-4dc4-b76f-360437ad697f",
  "license": {
    "maxNodes": "18446744073709551615",
    "expiryTs": "1591861766"
  }
}```

Hi @paulrostorp, you are right. The issue is because you have removed the alpha leader.
First of all, removeNode endpoint is not specifically for the downscaling dgraph cluster. It’s meant to replace the faulty node. But with few tricks, it can be used to downscale the cluster. But we have to ensure that cluster doesn’t go into an unstable state because of loss of consensus.

So when you set up 2N+1 number of nodes in a RAFT setup, the cluster can handle N failures. So removing the first node (either Leader or follower) doesn’t have any adverse effect on cluster. But removing the second node which is the leader leads to faulty quorum and thus an unstable cluster.

@aman-bansal Is there a way to resolve the issue, or do I have to wipe the volumes and restore ? Also, shouldn’t the removeNode endpoint trigger a re-election or shouldn’t there be a way to manually trigger it ?

As @aman-bansal said, /removeNode is meant to replace unhealthy nodes. That means it’s not used to remove the current leader of the group. Doing so can make the existing members stuck trying to connect if the leader was suddenly removed and there’s no longer a majority.

The process to call /removeNode only to remove followers (leaders are presumably active and healthy) for both Dgraph Zero and Dgraph Alpha groups.

Manual recovery

If your cluster is still stuck, you can wipe the volumes and restore. Otherwise, you can undergo some manual recovery steps by keeping one of the Alpha p directories.

  1. Check /state for the maxLeaseId and maxTxnTs information (see docs about /state).
  2. Keep a p directory around and remove other volumes.
  3. Start the Zeros.
    • Call /assign?what=uids&num=N where num is set to the value for maxLeaseId from step 1. This sets the UID lease for blank UID assignment.
    • Call /assign?what=timestamps&num=N where num is set to the value for maxTxnTs from step 1. This sets the latest txn timestamp.
  4. Copy the p directory to the respective Alpha volumes.
  5. Start the Alphas.

This is similar to the steps for bulk loading where bulk loader outputs p directory that you can then copy to the Alpha instances (step 4).

3 Likes

Thanks @dmai, this is very useful information.