Zero restarts on SIGINT


Report a Dgraph Bug

What version of Dgraph are you using?

https://github.com/dgraph-io/dgraph/commit/a5c469d56741a4a6e2c668b5053dba17bbfa5860

Have you tried reproducing the issue with the latest release?

YES

Steps to reproduce the issue (command/config used to run Dgraph).

  1. Run dgraph alpha
  2. Run dgraph zero wait for 2 seconds and press CTRL+C multiple times. You’ll notice that zero restarts when we were trying to stop it.
[Decoder]: Using assembly version of decoder
[Sentry] 2020/08/11 19:48:06 Integration installed: ContextifyFrames
[Sentry] 2020/08/11 19:48:06 Integration installed: Environment
[Sentry] 2020/08/11 19:48:06 Integration installed: Modules
[Sentry] 2020/08/11 19:48:06 Integration installed: IgnoreErrors
[Decoder]: Using assembly version of decoder
[Sentry] 2020/08/11 19:48:06 Integration installed: ContextifyFrames
[Sentry] 2020/08/11 19:48:06 Integration installed: Environment
[Sentry] 2020/08/11 19:48:06 Integration installed: Modules
[Sentry] 2020/08/11 19:48:06 Integration installed: IgnoreErrors
I0811 19:48:07.175307   23162 sentry_integration.go:48] This instance of Dgraph will send anonymous reports of panics back to Dgraph Labs via Sentry. No confidential information is sent. These reports help improve Dgraph. To opt-out, restart your instance with the --enable_sentry=false flag. For more info, see https://dgraph.io/docs/howto/#data-handling.
I0811 19:48:07.352158   23162 init.go:98] 

Dgraph version   : v2.0.0-rc1-554-ga5c469d56
Dgraph SHA-256   : 76aed969a261a9f3124424b12314696bbfc144a893996fc70a9975ba43f7ef39
Commit SHA-1     : a5c469d56
Commit timestamp : 2020-07-28 12:35:01 +0530
Branch           : master
Go version       : go1.14.6

For Dgraph official documentation, visit https://dgraph.io/docs/.
For discussions about Dgraph     , visit https://discuss.dgraph.io.

Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2020 Dgraph Labs, Inc.


I0811 19:48:07.352197   23162 run.go:187] Setting Config to: {bindall:true myAddr: portOffset:0 nodeId:1 numReplicas:1 peer: w:zw rebalanceInterval:480000000000 LudicrousMode:false}
I0811 19:48:07.352225   23162 run.go:108] Setting up grpc listener at: 0.0.0.0:5080
I0811 19:48:07.352354   23162 run.go:108] Setting up http listener at: 0.0.0.0:6080
badger 2020/08/11 19:48:07 INFO: All 1 tables opened in 0s
badger 2020/08/11 19:48:07 INFO: Replaying file id: 0 at offset: 4193
badger 2020/08/11 19:48:07 INFO: Replay took: 42.258µs
I0811 19:48:07.357723   23162 node.go:148] Setting raft.Config to: &{ID:1 peers:[] learners:[] ElectionTick:20 HeartbeatTick:1 Storage:0xc000302d80 Applied:0 MaxSizePerMsg:262144 MaxCommittedSizePerReady:67108864 MaxUncommittedEntriesSize:0 MaxInflightMsgs:256 CheckQuorum:false PreVote:true ReadOnlyOption:0 Logger:0x2be7358 DisableProposalForwarding:false}
I0811 19:48:07.357870   23162 node.go:317] Found hardstate: {Term:17 Vote:1 Commit:26 XXX_unrecognized:[]}
I0811 19:48:07.357905   23162 node.go:326] Group 0 found 26 entries
I0811 19:48:07.357921   23162 raft.go:515] Restarting node for dgraphzero
I0811 19:48:07.357963   23162 log.go:34] 1 became follower at term 17
I0811 19:48:07.357991   23162 log.go:34] newRaft 1 [peers: [], term: 17, commit: 26, applied: 0, lastindex: 26, lastterm: 17]
I0811 19:48:07.358091   23162 run.go:312] Running Dgraph Zero...
I0811 19:48:07.359077   23162 node.go:185] Setting conf state to nodes:1 
I0811 19:48:07.359114   23162 raft.go:757] Done applying conf change at 0x1
I0811 19:48:07.359213   23162 pool.go:160] CONNECTING to localhost:7080
W0811 19:48:07.359223   23162 pool.go:191] Shutting down extra connection to localhost:7080
W0811 19:48:07.359926   23162 pool.go:254] Connection lost with localhost:7080. Error: rpc error: code = Canceled desc = context canceled
I0811 19:48:07.877649   23162 zero.go:422] Got connection request: cluster_info_only:true 
I0811 19:48:07.878053   23162 log.go:34] 1 no leader at term 17; dropping index reading msg
^CI0811 19:48:09.629724   23162 run.go:275] --- Received interrupt signal
I0811 19:48:09.630137   23162 run.go:275] --- Received interrupt signal
I0811 19:48:09.630200   23162 run.go:284] --- Ignoring interrupt signal.
I0811 19:48:09.630290   23162 run.go:294] Shutting down...
E0811 19:48:09.630476   23162 http.go:252] Stopped taking more http(s) requests. Err: accept tcp [::]:6080: use of closed network connection
I0811 19:48:09.631197   23162 http.go:256] All http(s) requests finished.
E0811 19:48:09.630645   23162 node.go:712] [0x1] While trying to do lin read index: Closer has been called
E0811 19:48:09.631291   23162 node.go:659] Error while trying to call ReadIndex: raft: stopped
E0811 19:48:09.631336   23162 node.go:712] [0x1] While trying to do lin read index: raft: stopped
I0811 19:48:09.631334   23162 zero.go:431] Connected: cluster_info_only:true 
I0811 19:48:09.631445   23162 raft.go:686] Zero Node.Run finished.
I0811 19:48:09.631589   23162 run.go:146] gRPC server stopped : accept tcp [::]:5080: use of closed network connection
I0811 19:48:09.631620   23162 zero.go:422] Got connection request: cluster_info_only:true 
^CI0811 19:48:10.454118   23151 sentry_integration.go:48] This instance of Dgraph will send anonymous reports of panics back to Dgraph Labs via Sentry. No confidential information is sent. These reports help improve Dgraph. To opt-out, restart your instance with the --enable_sentry=false flag. For more info, see https://dgraph.io/docs/howto/#data-handling.
I0811 19:48:10.643122   23151 init.go:98] 

Dgraph version   : v2.0.0-rc1-554-ga5c469d56
Dgraph SHA-256   : 76aed969a261a9f3124424b12314696bbfc144a893996fc70a9975ba43f7ef39
Commit SHA-1     : a5c469d56
Commit timestamp : 2020-07-28 12:35:01 +0530
Branch           : master
Go version       : go1.14.6

For Dgraph official documentation, visit https://dgraph.io/docs/.
For discussions about Dgraph     , visit https://discuss.dgraph.io.

Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2020 Dgraph Labs, Inc.


I0811 19:48:10.643178   23151 run.go:187] Setting Config to: {bindall:true myAddr: portOffset:0 nodeId:1 numReplicas:1 peer: w:zw rebalanceInterval:480000000000 LudicrousMode:false}
I0811 19:48:10.643202   23151 run.go:108] Setting up grpc listener at: 0.0.0.0:5080
I0811 19:48:10.643299   23151 run.go:108] Setting up http listener at: 0.0.0.0:6080
badger 2020/08/11 19:48:10 INFO: All 1 tables opened in 1ms
badger 2020/08/11 19:48:10 INFO: Replaying file id: 0 at offset: 4193
badger 2020/08/11 19:48:10 INFO: Replay took: 65.073µs
I0811 19:48:10.649352   23151 node.go:148] Setting raft.Config to: &{ID:1 peers:[] learners:[] ElectionTick:20 HeartbeatTick:1 Storage:0xc00004e5c0 Applied:0 MaxSizePerMsg:262144 MaxCommittedSizePerReady:67108864 MaxUncommittedEntriesSize:0 MaxInflightMsgs:256 CheckQuorum:false PreVote:true ReadOnlyOption:0 Logger:0x2be7358 DisableProposalForwarding:false}
I0811 19:48:10.649471   23151 node.go:317] Found hardstate: {Term:17 Vote:1 Commit:26 XXX_unrecognized:[]}
I0811 19:48:10.649514   23151 node.go:326] Group 0 found 26 entries
I0811 19:48:10.649524   23151 raft.go:515] Restarting node for dgraphzero
I0811 19:48:10.649565   23151 log.go:34] 1 became follower at term 17
I0811 19:48:10.649589   23151 log.go:34] newRaft 1 [peers: [], term: 17, commit: 26, applied: 0, lastindex: 26, lastterm: 17]
I0811 19:48:10.649672   23151 run.go:312] Running Dgraph Zero...
I0811 19:48:10.650617   23151 node.go:185] Setting conf state to nodes:1 
I0811 19:48:10.650653   23151 raft.go:757] Done applying conf change at 0x1
I0811 19:48:10.650763   23151 pool.go:160] CONNECTING to localhost:7080
W0811 19:48:10.650774   23151 pool.go:191] Shutting down extra connection to localhost:7080
W0811 19:48:10.651420   23151 pool.go:254] Connection lost with localhost:7080. Error: rpc error: code = Canceled desc = context canceled
I0811 19:48:11.650029   23151 log.go:34] 1 no leader at term 17; dropping index reading msg
I0811 19:48:12.055526   23151 zero.go:422] Got connection request: cluster_info_only:true 
I0811 19:48:13.350128   23151 log.go:34] 1 is starting a new election at term 17
I0811 19:48:13.350191   23151 log.go:34] 1 became pre-candidate at term 17
I0811 19:48:13.350207   23151 log.go:34] 1 received MsgPreVoteResp from 1 at term 17
I0811 19:48:13.350410   23151 log.go:34] 1 became candidate at term 18
I0811 19:48:13.350465   23151 log.go:34] 1 received MsgVoteResp from 1 at term 18
I0811 19:48:13.350607   23151 log.go:34] 1 became leader at term 18
I0811 19:48:13.350677   23151 log.go:34] raft.node: 1 elected leader 1 at term 18
I0811 19:48:13.350771   23151 raft.go:722] I've become the leader, updating leases.
I0811 19:48:13.350794   23151 assign.go:42] Updated Lease id: 1. Txn Ts: 10001
W0811 19:48:13.649985   23151 node.go:674] [0x1] Read index context timed out
I0811 19:48:13.650581   23151 zero.go:440] Connected: cluster_info_only:true 
I0811 19:48:13.651018   23151 zero.go:422] Got connection request: cluster_info_only:true 
I0811 19:48:13.651438   23151 zero.go:440] Connected: cluster_info_only:true 
I0811 19:48:13.651758   23151 zero.go:422] Got connection request: cluster_info_only:true 
I0811 19:48:13.652156   23151 zero.go:440] Connected: cluster_info_only:true 
I0811 19:48:13.652557   23151 zero.go:422] Got connection request: cluster_info_only:true 
I0811 19:48:13.653054   23151 zero.go:440] Connected: cluster_info_only:true 
^CI0811 19:48:14.629619   23151 run.go:275] --- Received interrupt signal
I0811 19:48:14.629902   23151 run.go:294] Shutting down...
E0811 19:48:14.630030   23151 http.go:252] Stopped taking more http(s) requests. Err: accept tcp [::]:6080: use of closed network connection
I0811 19:48:14.630846   23151 http.go:256] All http(s) requests finished.
I0811 19:48:14.630181   23151 raft.go:686] Zero Node.Run finished.
I0811 19:48:14.630346   23151 run.go:146] gRPC server stopped : accept tcp [::]:5080: use of closed network connection
^C

Expected behaviour and actual result.

I expected Zero to stop on ctrl C .

Other details

I looked into this and it looks like the check here

is never successful. The exit status is always -1 in case of zero. The Panicwrap library returns the exit status of the child process to the parent process and the status.ExitStatus() returns -1 if the process is still running (which is true when we press multiple CRTL+C at the same time).

I am not sure if this should be fixed on dgraph or in the panic wrap libary.

@Paras. I believe @ashishgoswami saw something similar on alpha while running his tests.

1 Like

Are you reliably able to reproduce this? One test you could do is to see --enable_sentry false flag fixes it. Then we know it is sentry for sure.

Yes, I can reliable reproduce it. It’s not sentry, it is the panicWrap library.

Ya. panicwrap is used only by sentry in dgraph.