I upgraded a kubernetes cluster by changing the image version to 1.0.3 (from 1.0.2) which looks to have brought the whole thing down.
I noticed that the zero container now sets up on ports 3080 and 4080. Not sure why because the docs say it’s 5080 and 6080.
Setting up grpc listener at: 0.0.0.0:3080
Setting up http listener at: 0.0.0.0:4080
Zero is started using the command
dgraph zero -o -2000 --replicas 3 --my=$(hostname -f):5080 --idx $idx
And server is started using
dgraph server --my=$(hostname -f):7080 --memory_mb 8192 --zero dgraph-0.dgraph.default.svc.cluster.local:5080
And says it’s using these ports
2018/02/13 18:26:39 gRPC server started. Listening on port 9080
2018/02/13 18:26:39 HTTP server started. Listening on port 8080
Zero is complaining like this
Echo error from dgraph-2.dgraph.default.svc.cluster.local:7080. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
And server
Unable to join cluster via dgraphzero
github.com/dgraph-io/dgraph/x.Fatalf
/home/travis/gopath/src/github.com/dgraph-io/dgraph/x/error.go:103
github.com/dgraph-io/dgraph/worker.StartRaftNodes
/home/travis/gopath/src/github.com/dgraph-io/dgraph/worker/groups.go:107
runtime.goexit
/home/travis/.gimme/versions/go1.9.2.linux.amd64/src/runtime/asm_amd64.s:2337
I’m going to try and figure out how to fix this. But this is a warning to anyone else who tries to do this upgrade.