I’m testing with Dgraph clustering on AWS EC2. The cluster currently consists of three Zero instances. Everything seems to be working, but I’m just looking for some clarification on some behaviors I’ve noticed.
In my testing, I removed a Zero instance (Raft index=3) from the cluster. I verified the Zero node’s removed status in the zerohost:6080/state endpoint. I then terminated the Zero Raft 3 EC2.
I then started up a new Zero instance (Raft index=4). When it starts up, the Zero instance successfully joins the cluster, however, I see in the Zero log that it attempts to connect to the removed instance (Raft index 3). It fails to connect (as expected), but 1) is the expected behavior for a Zero instance to attempt to connect to removed instances?
Also, when a new Zero instance starts up, 2) does the Peer flag need to set to the leader node, or can it also point to another follower? When Zero attempts to connect to a non-leader node, I seem to frequently (if not always) receive a “context-exceeded” error and the Zero instance fails to join the cluster. When the Zero peer flag is pointed to the leader node, it always seems to join without a problem.
Thanks in advance.
Dgraph metadata
dgraph version
Dgraph version : v21.03.1
Dgraph codename : rocket-1
Dgraph SHA-256 : a00b73d583a720aa787171e43b4cb4dbbf75b38e522f66c9943ab2f0263007fe
Commit SHA-1 : ea1cb5f35
Commit timestamp : 2021-06-17 20:38:11 +0530
Branch : HEAD
Go version : go1.16.2
jemalloc enabled : true
Once an instance is removed, the other members shouldn’t be attempting to connect to the removed instance anymore. If the membership info in /state shows that there are the current three members (index 1, index 2, and index 4), then the setup sounds correct.
It can point to any peer. Technically, it needs to point to any currently healthy member of the cluster so that it can connect and join the membership.
Zero indexes 1, 2, and 5 are available, as expected. 3 and 4 have been removed.
This is what I see in the Zero startup log for the new Zero index 5.
I1109 21:18:28.005305 9340 pool.go:162] CONNECTING to 202.25.0.78:5080
I1109 21:18:28.010034 9340 raft.go:659] [0x5] Starting node
I1109 21:18:28.010083 9340 log.go:34] 5 became follower at term 0
I1109 21:18:28.010100 9340 log.go:34] newRaft 5 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
I1109 21:18:28.010108 9340 log.go:34] 5 became follower at term 1
I1109 21:18:28.010367 9340 zero.go:114] Starting telemetry data collection for zero...
I1109 21:18:28.010408 9340 run.go:388] Running Dgraph Zero...
I1109 21:18:29.063365 9340 log.go:34] 5 [term: 1] received a MsgHeartbeat message with higher term from 1 [term: 2]
I1109 21:18:29.063396 9340 log.go:34] 5 became follower at term 2
I1109 21:18:29.063409 9340 log.go:34] raft.node: 5 elected leader 1 at term 2
I1109 21:18:30.065995 9340 node.go:189] Setting conf state to nodes:1
I1109 21:18:30.066191 9340 raft.go:966] Done applying conf change at 0x5
I1109 21:18:30.066249 9340 node.go:189] Setting conf state to nodes:1 nodes:2
I1109 21:18:30.066272 9340 raft.go:966] Done applying conf change at 0x5
I1109 21:18:30.066292 9340 node.go:189] Setting conf state to nodes:1 nodes:2 nodes:3
I1109 21:18:30.066312 9340 raft.go:966] Done applying conf change at 0x5
I1109 21:18:30.066324 9340 node.go:189] Setting conf state to nodes:1 nodes:2
I1109 21:18:30.066341 9340 raft.go:966] Done applying conf change at 0x5
I1109 21:18:30.066346 9340 pool.go:162] CONNECTING to 202.25.0.53:5080
I1109 21:18:30.066375 9340 raft.go:966] Done applying conf change at 0x5
I1109 21:18:30.066393 9340 node.go:189] Setting conf state to nodes:1 nodes:2 nodes:4
I1109 21:18:30.066412 9340 raft.go:966] Done applying conf change at 0x5
I1109 21:18:30.066422 9340 node.go:189] Setting conf state to nodes:1 nodes:2
I1109 21:18:30.066439 9340 raft.go:966] Done applying conf change at 0x5
I1109 21:18:30.066458 9340 node.go:189] Setting conf state to nodes:1 nodes:2 nodes:5
I1109 21:18:30.066476 9340 raft.go:966] Done applying conf change at 0x5
I1109 21:18:30.066519 9340 pool.go:162] CONNECTING to 202.25.0.181:5080
I1109 21:18:30.066537 9340 pool.go:162] CONNECTING to 202.25.0.157:5080
I1109 21:18:30.066583 9340 pool.go:162] CONNECTING to 202.25.0.161:5080
W1109 21:18:33.071875 9340 pool.go:267] Connection lost with 202.25.0.181:5080. Error: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 202.25.0.181:5080: connect: no route to host"
W1109 21:18:33.071875 9340 pool.go:267] Connection lost with 202.25.0.157:5080. Error: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 202.25.0.157:5080: connect: no route to host"
W1109 21:18:33.073796 9340 pool.go:267] Connection lost with 202.25.0.161:5080. Error: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 202.25.0.161:5080: connect: no route to host"
It successfully joins the cluster, but the last three lines of the log make it look like Zero is attempting to connect to the removed IP addresses. I’m not even sure where it gets the 181 address from. I think it may have been a previous Zero instance that was terminated but didn’t fully join the cluster. I noticed that my counter property is set to 4, even though I only have 3 nodes.
Regarding pointing a new Zero to non-leader nodes, this is what I see in the follower node logs - it doesn’t seem like it will route to the leader:
I1109 19:52:04.436885 9335 pool.go:162] CONNECTING to 202.25.0.53:5080
I1109 19:55:41.980440 9335 log.go:34] 2 not forwarding to leader 1 at term 2; dropping proposal
But if I point the new Zero instance directly to the leader, it will join the cluster without a problem.
The instances are launching from an auto-scaling group, so all configured identically.
Also, what would it take for the “amDead” property to get set to true? When I kill a dgraph zero process, it doesn’t seem to force that flag to change.
Thanks for sharing the logs. It looks like what’s happening here is that this new Zero came up and, as expected, is replaying the write-ahead log and applying the updates as seen by its peers. This includes the conf changes (aka Setting conf state to ... logs) for the previous membership states up to the latest one. This triggers the new instance to attempt to connect to past members during the WAL replay.
The /state output you shared looks right (3 members with 2 removed). So, all in all, things look as they should be.
counter has nothing to do with the number of members in the cluster. It’s an book-keeping value. It tracking the Raft index of the latest updates.
I see what you mean. amDead is a field used in the proposal when removing a node. I don’t see it reflected in the /state info currently.
Ah, I see what you mean. Looks like you’re right about this. You’ll need to point the --peer config to the leader node.