Dgraph Zero - Adding/Removing Zero Instances Clarification

I’m testing with Dgraph clustering on AWS EC2. The cluster currently consists of three Zero instances. Everything seems to be working, but I’m just looking for some clarification on some behaviors I’ve noticed.

In my testing, I removed a Zero instance (Raft index=3) from the cluster. I verified the Zero node’s removed status in the zerohost:6080/state endpoint. I then terminated the Zero Raft 3 EC2.

I then started up a new Zero instance (Raft index=4). When it starts up, the Zero instance successfully joins the cluster, however, I see in the Zero log that it attempts to connect to the removed instance (Raft index 3). It fails to connect (as expected), but 1) is the expected behavior for a Zero instance to attempt to connect to removed instances?

Also, when a new Zero instance starts up, 2) does the Peer flag need to set to the leader node, or can it also point to another follower? When Zero attempts to connect to a non-leader node, I seem to frequently (if not always) receive a “context-exceeded” error and the Zero instance fails to join the cluster. When the Zero peer flag is pointed to the leader node, it always seems to join without a problem.

Thanks in advance.

Dgraph metadata

dgraph version

Dgraph version : v21.03.1
Dgraph codename : rocket-1
Dgraph SHA-256 : a00b73d583a720aa787171e43b4cb4dbbf75b38e522f66c9943ab2f0263007fe
Commit SHA-1 : ea1cb5f35
Commit timestamp : 2021-06-17 20:38:11 +0530
Branch : HEAD
Go version : go1.16.2
jemalloc enabled : true

Once an instance is removed, the other members shouldn’t be attempting to connect to the removed instance anymore. If the membership info in /state shows that there are the current three members (index 1, index 2, and index 4), then the setup sounds correct.

It can point to any peer. Technically, it needs to point to any currently healthy member of the cluster so that it can connect and join the membership.

Thanks for the quick reply. I just went ahead and retested the removal again. I logged onto the leader Zero and removed Index 4 using the command:

curl "http://localhost:6080/removeNode?id=4&group=0"

This is from the Zero state endpoint:

{
  "counter": "4",
  "groups": {},
  "zeros": {
    "1": {
      "id": "1",
      "groupId": 0,
      "addr": "202.25.0.78:5080",
      "leader": true,
      "amDead": false,
      "lastUpdate": "0",
      "learner": false,
      "clusterInfoOnly": false,
      "forceGroupId": false
    },
    "2": {
      "id": "2",
      "groupId": 0,
      "addr": "202.25.0.53:5080",
      "leader": false,
      "amDead": false,
      "lastUpdate": "0",
      "learner": false,
      "clusterInfoOnly": false,
      "forceGroupId": false
    },
    "5": {
      "id": "5",
      "groupId": 0,
      "addr": "202.25.0.183:5080",
      "leader": false,
      "amDead": false,
      "lastUpdate": "0",
      "learner": false,
      "clusterInfoOnly": false,
      "forceGroupId": false
    }
  },
  "maxUID": "0",
  "maxTxnTs": "0",
  "maxNsID": "0",
  "maxRaftId": "0",
  "removed": [
    {
      "id": "3",
      "groupId": 0,
      "addr": "202.25.0.161:5080",
      "leader": false,
      "amDead": false,
      "lastUpdate": "0",
      "learner": false,
      "clusterInfoOnly": false,
      "forceGroupId": false
    },
    {
      "id": "4",
      "groupId": 0,
      "addr": "202.25.0.157:5080",
      "leader": false,
      "amDead": false,
      "lastUpdate": "0",
      "learner": false,
      "clusterInfoOnly": false,
      "forceGroupId": false
    }
  ],
  "cid": "7fb1a305-00b7-4b15-9639-54d1c7e070b3",
  "license": {
    "user": "",
    "maxNodes": "18446744073709551615",
    "expiryTs": "1639068806",
    "enabled": true
  }
}

Zero indexes 1, 2, and 5 are available, as expected. 3 and 4 have been removed.

This is what I see in the Zero startup log for the new Zero index 5.

I1109 21:18:28.005305    9340 pool.go:162] CONNECTING to 202.25.0.78:5080
I1109 21:18:28.010034    9340 raft.go:659] [0x5] Starting node
I1109 21:18:28.010083    9340 log.go:34] 5 became follower at term 0
I1109 21:18:28.010100    9340 log.go:34] newRaft 5 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
I1109 21:18:28.010108    9340 log.go:34] 5 became follower at term 1
I1109 21:18:28.010367    9340 zero.go:114] Starting telemetry data collection for zero...
I1109 21:18:28.010408    9340 run.go:388] Running Dgraph Zero...
I1109 21:18:29.063365    9340 log.go:34] 5 [term: 1] received a MsgHeartbeat message with higher term from 1 [term: 2]
I1109 21:18:29.063396    9340 log.go:34] 5 became follower at term 2
I1109 21:18:29.063409    9340 log.go:34] raft.node: 5 elected leader 1 at term 2
I1109 21:18:30.065995    9340 node.go:189] Setting conf state to nodes:1 
I1109 21:18:30.066191    9340 raft.go:966] Done applying conf change at 0x5
I1109 21:18:30.066249    9340 node.go:189] Setting conf state to nodes:1 nodes:2 
I1109 21:18:30.066272    9340 raft.go:966] Done applying conf change at 0x5
I1109 21:18:30.066292    9340 node.go:189] Setting conf state to nodes:1 nodes:2 nodes:3 
I1109 21:18:30.066312    9340 raft.go:966] Done applying conf change at 0x5
I1109 21:18:30.066324    9340 node.go:189] Setting conf state to nodes:1 nodes:2 
I1109 21:18:30.066341    9340 raft.go:966] Done applying conf change at 0x5
I1109 21:18:30.066346    9340 pool.go:162] CONNECTING to 202.25.0.53:5080
I1109 21:18:30.066375    9340 raft.go:966] Done applying conf change at 0x5
I1109 21:18:30.066393    9340 node.go:189] Setting conf state to nodes:1 nodes:2 nodes:4 
I1109 21:18:30.066412    9340 raft.go:966] Done applying conf change at 0x5
I1109 21:18:30.066422    9340 node.go:189] Setting conf state to nodes:1 nodes:2 
I1109 21:18:30.066439    9340 raft.go:966] Done applying conf change at 0x5
I1109 21:18:30.066458    9340 node.go:189] Setting conf state to nodes:1 nodes:2 nodes:5 
I1109 21:18:30.066476    9340 raft.go:966] Done applying conf change at 0x5
I1109 21:18:30.066519    9340 pool.go:162] CONNECTING to 202.25.0.181:5080
I1109 21:18:30.066537    9340 pool.go:162] CONNECTING to 202.25.0.157:5080
I1109 21:18:30.066583    9340 pool.go:162] CONNECTING to 202.25.0.161:5080
W1109 21:18:33.071875    9340 pool.go:267] Connection lost with 202.25.0.181:5080. Error: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 202.25.0.181:5080: connect: no route to host"
W1109 21:18:33.071875    9340 pool.go:267] Connection lost with 202.25.0.157:5080. Error: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 202.25.0.157:5080: connect: no route to host"
W1109 21:18:33.073796    9340 pool.go:267] Connection lost with 202.25.0.161:5080. Error: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 202.25.0.161:5080: connect: no route to host"

It successfully joins the cluster, but the last three lines of the log make it look like Zero is attempting to connect to the removed IP addresses. I’m not even sure where it gets the 181 address from. I think it may have been a previous Zero instance that was terminated but didn’t fully join the cluster. I noticed that my counter property is set to 4, even though I only have 3 nodes.

Regarding pointing a new Zero to non-leader nodes, this is what I see in the follower node logs - it doesn’t seem like it will route to the leader:

I1109 19:52:04.436885    9335 pool.go:162] CONNECTING to 202.25.0.53:5080
I1109 19:55:41.980440    9335 log.go:34] 2 not forwarding to leader 1 at term 2; dropping proposal

But if I point the new Zero instance directly to the leader, it will join the cluster without a problem.

The instances are launching from an auto-scaling group, so all configured identically.

Also, what would it take for the “amDead” property to get set to true? When I kill a dgraph zero process, it doesn’t seem to force that flag to change.

Thanks!

Thanks for sharing the logs. It looks like what’s happening here is that this new Zero came up and, as expected, is replaying the write-ahead log and applying the updates as seen by its peers. This includes the conf changes (aka Setting conf state to ... logs) for the previous membership states up to the latest one. This triggers the new instance to attempt to connect to past members during the WAL replay.

The /state output you shared looks right (3 members with 2 removed). So, all in all, things look as they should be.

counter has nothing to do with the number of members in the cluster. It’s an book-keeping value. It tracking the Raft index of the latest updates.

I see what you mean. amDead is a field used in the proposal when removing a node. I don’t see it reflected in the /state info currently.

Ah, I see what you mean. Looks like you’re right about this. You’ll need to point the --peer config to the leader node.

Thanks @dmai !

1 Like