Custom Cluster Tests Exit Abnormally and Leave Dangling Containers

Moved from GitHub dgraph/5897

Posted by darkn3rd:

When running custom cluster tests, occasionally the script exits abnormally and doesn’t clean up.

What version of Dgraph are you using?

v20.03.3

Have you tried reproducing the issue with the latest release?

yes

What is the hardware spec (RAM, OS)?

Ubuntu Bionic Beaver

Steps to reproduce the issue (command/config used to run Dgraph).

cd $GOPATH/src/github.com/dgraph-io/
git clone https://github.com/dgraph-io/dgraph.git
cd dgraph
git checkout release/v20.03
export GO111MODULE=on
./test.sh -C 2>&1 | tee /tmp/v20.03.custom_cluster.$(date +%Y%m%d_%H%M).txt

Expected behaviour and actual result.

Expected

The expected behaviour would be a message about test passing or failing and no left over containers running.

INFO: Running tests in directory graphql/e2e/schema_subscribe
  ...
  ...
PASS
ok  	github.com/dgraph-io/dgraph/graphql/e2e/schema_subscribe	0.327s
INFO: Stopping cluster
Stopped alpha3
Stopped alpha2
Stopped zero1
Stopped alpha1
Removed alpha3
Removed alpha2
Removed zero1
Removed alpha1
INFO: Tests completed in 10m 28s
INFO: All tests passed!

Actual

It looks like the script or subshell is exited abnormally and left containers behind.

INFO: Running tests in directory graphql/e2e/schema_subscribe
Rebuilding dgraph ...
Commit SHA256: 54940e2aea1acf788d4179495bfd995e3eef1706
Old SHA256: 1b618157b1495e48a175073b8afd53cab0954ebac72a4378b9e4880a09b585a9
New SHA256: 1b618157b1495e48a175073b8afd53cab0954ebac72a4378b9e4880a09b585a9
zero1
alpha1
Removing orphan container "zeroAdmin"
Removing orphan container "alphaAdmin"
Creating alpha1 ... done 
Creating zero1  ... done 
Creating alpha2 ... done 
Creating alpha3 ... done
wait-for-it.sh: waiting 60 seconds for localhost:6180
wait-for-it.sh: localhost:6180 is available after 0 seconds
wait-for-it.sh: waiting 60 seconds for localhost:9180
wait-for-it.sh: localhost:9180 is available after 0 seconds
...[Decoder]: Using assembly version of decoder

INFO: Running tests in directory graphql/e2e/schema_subscribe
ok  	github.com/dgraph-io/dgraph/graphql/e2e/schema_subscribe	0.549s

You can see 3 alphas + 1 zero container left behind with docker ps | awk '{ print $NF }'.

Related PR: fix(GraphQL): fix dangling containers in Custom Cluster tests related to GraphQL by minhaj-shakeel · Pull Request #6595 · dgraph-io/dgraph · GitHub