Unable to join cluster via dgraphzero after upgrading to 1.0.3

I upgraded a kubernetes cluster by changing the image version to 1.0.3 (from 1.0.2) which looks to have brought the whole thing down.

I noticed that the zero container now sets up on ports 3080 and 4080. Not sure why because the docs say it’s 5080 and 6080.

Setting up grpc listener at: 0.0.0.0:3080
Setting up http listener at: 0.0.0.0:4080

Zero is started using the command
dgraph zero -o -2000 --replicas 3 --my=$(hostname -f):5080 --idx $idx

And server is started using
dgraph server --my=$(hostname -f):7080 --memory_mb 8192 --zero dgraph-0.dgraph.default.svc.cluster.local:5080

And says it’s using these ports

2018/02/13 18:26:39 gRPC server started.  Listening on port 9080
2018/02/13 18:26:39 HTTP server started.  Listening on port 8080

Zero is complaining like this

Echo error from dgraph-2.dgraph.default.svc.cluster.local:7080. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure

And server

Unable to join cluster via dgraphzero
github.com/dgraph-io/dgraph/x.Fatalf
/home/travis/gopath/src/github.com/dgraph-io/dgraph/x/error.go:103
github.com/dgraph-io/dgraph/worker.StartRaftNodes
/home/travis/gopath/src/github.com/dgraph-io/dgraph/worker/groups.go:107
runtime.goexit
/home/travis/.gimme/versions/go1.9.2.linux.amd64/src/runtime/asm_amd64.s:2337

I’m going to try and figure out how to fix this. But this is a warning to anyone else who tries to do this upgrade.

Starting from 1.0.3, the default ports have been changed for Dgraph Zero and so that you dont have to provide -0 -2000.

I would say change the zero start command to:

dgraph zero --replicas 3 --my=$(hostname -f):5080 --idx $idx

Yeah to resolve, I removed the -o option as you mentioned. Basically took a look at https://github.com/dgraph-io/dgraph/blob/master/contrib/config/kubernetes/dgraph-ha.yaml to see what changed from my deployment.

For some reason the change was not applying, so I used kubectl delete pod dgraph-X to delete all pods and the change was then rolled out and now it’s OK.

The port change was in the release notes, so not blaming the dgraph team for anything here.

2 Likes

Were you able to get this to work? Im experiencing the same issue right now. Dgraph server complains and shuts off.

Yes it’s working just fine for me now. Did you see the comment about removing -o -2000 from the statefulset?
Also in my case I had to use kubectl to delete the existing PODs for the changes to take effect.

Right, so I was able to get it working when removing port. Sorry I did forget to mention that Im trying to run this locally on mac.

Do we why we can’t then specify the ports manually?

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.