Dgraph doesn't shutdown on ctrl+c

Moved from GitHub dgraph/5627

Posted by jarifibrahim:

What version of Dgraph are you using?

Have you tried reproducing the issue with the latest release?

No

Steps to reproduce the issue (command/config used to run Dgraph).

  1. Start alpha with dgraph alpha command
  2. Press ctrl+c

Expected behaviour and actual result.

I expected the server to shutdown but it doesn’t shutdown

The server keeps retrying the GraphQL schema

E0610 19:28:22.269326    3774 run.go:396] GRPC listener canceled: accept tcp [::]:9080: use of closed network connection
E0610 19:28:22.269426    3774 run.go:415] Stopped taking more http(s) requests. Err: accept tcp [::]:8080: use of closed network connection
I0610 19:28:22.269539    3774 run.go:549] tada
I0610 19:28:22.269546    3774 run.go:707] GRPC and HTTP stopped.
I0610 19:28:23.738962    3774 admin.go:652] Error reading GraphQL schema: Dgraph execution failed because Please retry again, server is not ready to accept requests.
I0610 19:28:28.739161    3774 admin.go:652] Error reading GraphQL schema: Dgraph execution failed because Please retry again, server is not ready to accept requests.
I0610 19:28:33.739416    3774 admin.go:652] Error reading GraphQL schema: Dgraph execution failed because Please retry again, server is not ready to accept requests.
diff --git a/dgraph/cmd/alpha/run.go b/dgraph/cmd/alpha/run.go
index 2fb31a4a3..2aedfe3d7 100644
--- a/dgraph/cmd/alpha/run.go
+++ b/dgraph/cmd/alpha/run.go
@@ -546,6 +546,7 @@ func setupServer(closer *y.Closer) {
 	glog.Infoln("gRPC server started.  Listening on port", grpcPort())
 	glog.Infoln("HTTP server started.  Listening on port", httpPort())
 	wg.Wait()
+	glog.Info("tada")
 }
 
 func run() {

The tada was printed before the server completed the shutdown.

iluminae commented :

I would like to second this - I have been encountering my alpha/zero pods not fully stopping - I did not make a ticket because I was not positive it did not have to do with running in kubernetes, which I recognize not everyone does.

Maybe 60% of the time my alpha pods will hang on Terminating and looking at the logs, they shutdown all their http handlers and are just waiting on nothing forever. My hypothesis was it had to be something with all my pods stopping at once caused a RAFT race that resulted in an infinite wait… somehow - though that is just conjecture.

tharun208 commented :

I like to work on this

jarifibrahim commented :

Assigned to @tharun208 . Let me know if you need any help @tharun208.

tharun208 commented :

@jarifibrahim I went through the code and zero is having a closer at its server level struct. Can we do the same for alpha? right now we are creating different closers for admin server and ACL.

jarifibrahim commented :

@tharun208 Two different closers allow us to wait for different events. I believe we have two closers because we want ACLs to stop before admin endpoint stops responding. You should try to find out which closer doesn’t stop when we press CTRL-C for the first time.

The closer that doesn’t stop when pressing CTRL-C is our bug.

jarifibrahim commented :

I am no longer able to reproduce the issue. Alpha seems to be shutting down on the first CTRL-C . I tried on an empty alpha/zero setup. Maybe multiple nodes or data affects shutdown behavior.

tharun208 commented :

I am also running with not starting zero and empty alpha setup. aclCloser is not shutting down properly.

jarifibrahim commented :

The --ludicrous mode flag also seems to affect it. Alpha doesn’t stop on ctrl-c if it is running in ludicrous mode.

Fixed by; https://github.com/dgraph-io/dgraph/pull/6359