Hi. I use Dgraph within my GKE cluster deployed via Helm. All 3 zeroes crashed where these logs were visible when looking at the zeroes. I don’t know what triggered these errors (all I was doing is some admin operations like updating schema, enable/disable logging, query health of db, etc.):
I0125 16:06:02.427026 18 run.go:185] Setting Config to: {bindall:true portOffset:0 nodeId:1 numReplicas:5 peer: w:zw rebalanceInterval:480000000000 tlsClientConfig:<nil>}
I0125 16:06:02.427081 18 run.go:98] Setting up grpc listener at: 0.0.0.0:5080
I0125 16:06:02.428113 18 run.go:98] Setting up http listener at: 0.0.0.0:6080
I0125 16:06:02.429155 18 log.go:295] Found file: 1 First Index: 1
I0125 16:06:02.429492 18 storage.go:132] Init Raft Storage with snap: 166, first: 167, last: 173
I0125 16:06:02.469765 18 node.go:152] Setting raft.Config to: &{ID:1 peers:[] learners:[] ElectionTick:20 HeartbeatTick:1 Storage:0xc000606320 Applied:166 MaxSizePerMsg:262144 MaxCommittedSizePerReady:67108864 MaxUncommittedEntriesSize:0 MaxInflightMsgs:256 CheckQuorum:false PreVote:true ReadOnlyOption:0 Logger:0x2e00cb8 DisableProposalForwarding:false}
[Sentry] 2021/01/25 16:06:02 Sending fatal event [1ac1142b7d8f44b8b8026cb497d8859f] to o318308.ingest.sentry.io project: 5208688
I0125 16:06:02.474900 18 node.go:310] Found Snapshot.Metadata: {ConfState:{Nodes:[1 2 3] Learners:[] XXX_unrecognized:[]} Index:166 Term:2 XXX_unrecognized:[]}
I0125 16:06:02.475149 18 node.go:321] Found hardstate: {Term:4 Vote:1 Commit:173 XXX_unrecognized:[]}
I0125 16:06:02.475345 18 node.go:326] Group 0 found 173 entries
I0125 16:06:02.475356 18 raft.go:542] Restarting node for dgraphzero
I0125 16:06:02.475373 18 node.go:189] Setting conf state to nodes:1 nodes:2 nodes:3
2021/01/25 16:06:02 proto: wrong wireType = 0 for field Groups
github.com/dgraph-io/dgraph/x.Check
/ext-go/1/src/github.com/dgraph-io/dgraph/x/error.go:42
github.com/dgraph-io/dgraph/dgraph/cmd/zero.(*node).initAndStartNode
/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/zero/raft.go:550
github.com/dgraph-io/dgraph/dgraph/cmd/zero.run
/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/zero/run.go:254
github.com/dgraph-io/dgraph/dgraph/cmd/zero.init.0.func1
/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/zero/run.go:75
github.com/spf13/cobra.(*Command).execute
/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:830
github.com/spf13/cobra.(*Command).ExecuteC
/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:914
github.com/spf13/cobra.(*Command).Execute
/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:864
github.com/dgraph-io/dgraph/dgraph/cmd.Execute
/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/root.go:71
main.main
/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/main.go:102
runtime.main
/usr/local/go/src/runtime/proc.go:204
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1374
And since all 3 zeroes crashed, all alpha nodes show these logs in an infinite loop when trying to reconnect again and again:
I0125 08:12:43.888397 17 log.go:34] 1 received MsgPreVoteResp from 1 at term 3
I0125 08:12:43.888487 17 log.go:34] 1 [logterm: 3, index: 229] sent MsgPreVote request to 2 at term 3
I0125 08:12:43.888537 17 log.go:34] 1 [logterm: 3, index: 229] sent MsgPreVote request to 3 at term 3
I0125 08:12:47.688260 17 log.go:34] 1 is starting a new election at term 3
I0125 08:12:47.688300 17 log.go:34] 1 became pre-candidate at term 3
I0125 08:12:47.688309 17 log.go:34] 1 received MsgPreVoteResp from 1 at term 3
I0125 08:12:47.688342 17 log.go:34] 1 [logterm: 3, index: 229] sent MsgPreVote request to 2 at term 3
I0125 08:12:47.688354 17 log.go:34] 1 [logterm: 3, index: 229] sent MsgPreVote request to 3 at term 3
W0125 08:12:48.688573 17 node.go:420] Unable to send message to peer: 0x2. Error: Unhealthy connection
I0125 08:12:51.488153 17 log.go:34] 1 is starting a new election at term 3
I0125 08:12:51.488206 17 log.go:34] 1 became pre-candidate at term 3
I0125 08:12:51.488217 17 log.go:34] 1 received MsgPreVoteResp from 1 at term 3
I0125 08:12:51.488235 17 log.go:34] 1 [logterm: 3, index: 229] sent MsgPreVote request to 2 at term 3
I0125 08:12:51.488247 17 log.go:34] 1 [logterm: 3, index: 229] sent MsgPreVote request to 3 at term 3
W0125 08:12:52.488776 17 node.go:420] Unable to send message to peer: 0x3. Error: Unhealthy connection
I0125 08:12:55.288135 17 log.go:34] 1 is starting a new election at term 3
I0125 08:12:55.288209 17 log.go:34] 1 became pre-candidate at term 3
I0125 08:12:55.288220 17 log.go:34] 1 received MsgPreVoteResp from 1 at term 3
I0125 08:12:55.288236 17 log.go:34] 1 [logterm: 3, index: 229] sent MsgPreVote request to 2 at term 3
I0125 08:12:55.288248 17 log.go:34] 1 [logterm: 3, index: 229] sent MsgPreVote request to 3 at term 3
I0125 08:12:59.088173 17 log.go:34] 1 is starting a new election at term 3
I0125 08:12:59.088277 17 log.go:34] 1 became pre-candidate at term 3
I0125 08:12:59.088286 17 log.go:34] 1 received MsgPreVoteResp from 1 at term 3
I0125 08:12:59.088302 17 log.go:34] 1 [logterm: 3, index: 229] sent MsgPreVote request to 2 at term 3
I0125 08:12:59.088314 17 log.go:34] 1 [logterm: 3, index: 229] sent MsgPreVote request to 3 at term 3
W0125 08:13:00.088562 17 node.go:420] Unable to send message to peer: 0x2. Error: Unhealthy connection
None of the Dgraph operations work after this error. Currently, I am recreating Dgraph instances after destroying all the volumes to temporarily work around this.