Alphas cannot select leader

lukaszlenart · January 15, 2021, 10:09am

I noticed the following entries in the logs

I0115 10:07:02.822532      17 log.go:34] 1 is starting a new election at term 7
I0115 10:07:02.822568      17 log.go:34] 1 became pre-candidate at term 7
I0115 10:07:02.822574      17 log.go:34] 1 received MsgPreVoteResp from 1 at term 7
I0115 10:07:02.822586      17 log.go:34] 1 [logterm: 7, index: 40885] sent MsgPreVote request to 3 at term 7
I0115 10:07:02.823123      20 log.go:34] 2 [logterm: 7, index: 40918, vote: 3] rejected MsgPreVote from 1 [logterm: 7, index: 40885] at term 7
I0115 10:07:02.823155      17 log.go:34] 3 [logterm: 7, index: 40918, vote: 3] rejected MsgPreVote from 1 [logterm: 7, index: 40885] at term 7
I0115 10:07:02.822594      17 log.go:34] 1 [logterm: 7, index: 40885] sent MsgPreVote request to 2 at term 7
I0115 10:07:04.922394      17 log.go:34] 1 is starting a new election at term 7
I0115 10:07:04.922434      17 log.go:34] 1 became pre-candidate at term 7
I0115 10:07:04.922440      17 log.go:34] 1 received MsgPreVoteResp from 1 at term 7
I0115 10:07:04.922453      17 log.go:34] 1 [logterm: 7, index: 40885] sent MsgPreVote request to 3 at term 7
I0115 10:07:04.922461      17 log.go:34] 1 [logterm: 7, index: 40885] sent MsgPreVote request to 2 at term 7
I0115 10:07:04.922917      17 log.go:34] 3 [logterm: 7, index: 40918, vote: 3] rejected MsgPreVote from 1 [logterm: 7, index: 40885] at term 7
I0115 10:07:04.922934      20 log.go:34] 2 [logterm: 7, index: 40918, vote: 3] rejected MsgPreVote from 1 [logterm: 7, index: 40885] at term 7

and they repeat all over, looks like Alphas can not select a new leader. Query time is longer (a lot) than usual, is it possible to fix that somehow?

lukaszlenart · January 15, 2021, 10:12am

Fixed the problem by removing one of the Alphas (randomly).

lukaszlenart · January 19, 2021, 8:37am

Here is a more detailed explanation (as far as I understand it):

Alpha 0 was a leader
Alpha 0 has had some pending mutations when the pod has been terminated
other Alphas selected a new leader as Alpha 0 was down
when Alpha 0 was back, it tried to finish those pending mutations but it wasn’t a leader, so it started a vote
Alpha 0’s vote cannot be accepted as the other two Alpha already selected a new leader
and Alphas ended up in an infinite loop on selecting a new leader
terminating the current leader allowed to bring Alpha 0 as a new leader and finished the mutations

Shouldn’t this be resolved in some other way? Like if Alpha noticed pending queries it should delegate them to the leader?

chewxy · January 19, 2021, 11:19am

While you found the solution, I was wondering if @ibrahim might be interested in working on the root cause?

ibrahim · January 19, 2021, 11:36am

I tried to follow the steps that @lukaszlenart mentioned but I don’t see the alpha getting stuck in a loop.

and Alphas ended up in an infinite loop on selecting a new leader

This shouldn’t happen. When an old node realizes that it is receiving the heartbeat of a higher term, the node becomes a follower and is should stop trying to become a leader.

@lukaszlenart are you able to reliably reproduce this issue?

lukaszlenart · January 19, 2021, 1:03pm

This shouldn’t happen. When an old node realizes that it is receiving the heartbeat of a higher term, the node becomes a follower and is should stop trying to become a leader.

This is my understanding of the logs which I put in my first post in this topic - I left cluster for whole weekend and all I noticed was exactly the same logs became pre-candidate at term 7 and cluster was terrible slow.
Terminating the leader allowed Alpha 0 became leader and apply pending queries. Till then cluster has started to be operative.

@lukaszlenart are you able to reliably reproduce this issue?

Somehow yes, when I use our UI and try to perform a few delete operations on some nodes, I see that a transaction starts but it’s never committed or rolled back and Dgraph client starts throwing Caused by: io.grpc.StatusRuntimeException: UNKNOWN: : context deadline exceeded

In Dgraph I see this

I0119 12:13:40.355373      17 draft.go:1611] Skipping snapshot at index: 50857. Insufficient discard entries: 1. MinPendingStartTs: 58828
I0119 12:13:40.379684      17 draft.go:1448] Found 1 old transactions. Acting to abort them.
I0119 12:13:40.380960      20 draft.go:602] Skipping snapshot at 50857, because found one at 50857
I0119 12:13:40.380871      18 draft.go:602] Skipping snapshot at 50857, because found one at 50857
I0119 12:13:40.381519      17 draft.go:1409] TryAbort 1 txns with start ts. Error: <nil>
I0119 12:13:40.381536      17 draft.go:1432] TryAbort selectively proposing only aborted txns: txns:<start_ts:58828 >

So then I start terminating pods, to terminate leader I must use --force

kubectl delete pod intouch-dgraph-alpha-2 --force --grace-period=0

and then I see in the logs

I0119 12:16:42.991090      18 log.go:34] 3 became pre-candidate at term 13
I0119 12:16:42.991096      18 log.go:34] 3 received MsgPreVoteResp from 3 at term 13
I0119 12:16:42.991651      18 log.go:34] 1 [logterm: 13, index: 50893, vote: 2] rejected MsgPreVote from 3 [logterm: 13, index: 50891] at term 13
I0119 12:16:42.991108      18 log.go:34] 3 [logterm: 13, index: 50891] sent MsgPreVote request to 1 at term 13
I0119 12:16:42.991115      18 log.go:34] 3 [logterm: 13, index: 50891] sent MsgPreVote request to 2 at term 13
I0119 12:16:45.591182      18 log.go:34] 3 is starting a new election at term 13
I0119 12:16:45.591788      20 log.go:34] 2 [logterm: 13, index: 50893, vote: 2] rejected MsgPreVote from 3 [logterm: 13, index: 50891] at term 13
I0119 12:16:45.591207      18 log.go:34] 3 became pre-candidate at term 13
I0119 12:16:45.591213      18 log.go:34] 3 received MsgPreVoteResp from 3 at term 13
I0119 12:16:45.591230      18 log.go:34] 3 [logterm: 13, index: 50891] sent MsgPreVote request to 1 at term 13
I0119 12:16:45.591238      18 log.go:34] 3 [logterm: 13, index: 50891] sent MsgPreVote request to 2 at term 13

so again gracefully terminating the new leader

kubectl delete pod intouch-dgraph-alpha-1

and hopefully that resolves the problem, if not I need to terminate the new leader again.

lukaszlenart · January 19, 2021, 2:33pm

I was able to reproduce the problem again and I think it’s related to transaction size - it contains 6 queries and 6 mutations - such transaction will hang on Dgraph side and Dgraph won’t be able to revert it or drop:

I0119 14:23:31.249922      16 draft.go:1448] Found 1 old transactions. Acting to abort them.
I0119 14:23:31.251924      16 draft.go:1409] TryAbort 1 txns with start ts. Error: <nil>
I0119 14:23:31.251947      16 draft.go:1432] TryAbort selectively proposing only aborted txns: txns:<start_ts:60011 >
I0119 14:21:48.098522      21 draft.go:1611] Skipping snapshot at index: 52106. Insufficient discard entries: 1. MinPendingStartTs: 60011
I0119 14:24:31.249745      16 draft.go:1611] Skipping snapshot at index: 52106. Insufficient discard entries: 1. MinPendingStartTs: 60011
I0119 14:24:31.249804      16 draft.go:1448] Found 1 old transactions. Acting to abort them.
I0119 14:24:31.259791      16 draft.go:1409] TryAbort 1 txns with start ts. Error: <nil>
I0119 14:24:31.259814      16 draft.go:1432] TryAbort selectively proposing only aborted txns: txns:<start_ts:60011 >
I0119 14:25:31.249798      16 draft.go:1611] Skipping snapshot at index: 52106. Insufficient discard entries: 1. MinPendingStartTs: 60011

so now killing the leader will break the cluster as I described above.

lukaszlenart · January 19, 2021, 2:38pm

Just to clarify: when this transaction hangs, cluster slows down, that’s why we need to restart pods

zzl221000 · February 24, 2021, 7:38am

I’m having the same problem!
After randomly removing nodes, the remaining nodes cannot select the leader.

lukaszlenart · March 4, 2021, 7:06pm

I think we found the main reason of these problems - it’s still under investigation but this is the best hunch we have for now.

Our Dgraph cluster once again had some problems yesterday. One of the Zeros’ Pod was terminated unexpectedly - no log entries, no OOM, nothing. It happened during heavy load on the cluster, a lot of transactions and so. This put the cluster in an unstable state.

After investigating logs, Grafana, and so we found that at the same time one of our EC2 box was terminated as well. This is a Spot EC2 instance which is restarted every 24h - mainly it should be used by Jenkins, but it looks like the was a bug in a filter and also normal Pods (like Dgraph Pods) were scheduled on those Spot instances. The case is that there is also a monitor which tries to re-schedule the Pods running on a Spot instance onto another EC2 box and in most cases it worked well. Yet under heavy load on Dgraph this could fail from time to time.

We are going to fix the filter or temporarily remove the Spot instances - hopefully this will make this issue gone

Topic		Replies	Views
Alpha keep sending votes to deleted nodes,cannot select leader Dgraph	7	885	February 25, 2021
Not able to run queries and mutations on Alpha Dgraph dgraph	15	615	January 20, 2021
What's the point is to put this message every few milliseconds? Dgraph	19	1127	February 6, 2019
Dgraph Alpha restart Dgraph dgraph	0	429	May 8, 2023
Start alpha take up nearly an hour and can not query when exist pending proposals or pending query Users	16	1816	February 10, 2019

Alphas cannot select leader

Related topics