Unable to join cluster via dgraphzero after upgrading to 1.0.3

artooro · February 13, 2018, 6:33pm

I upgraded a kubernetes cluster by changing the image version to 1.0.3 (from 1.0.2) which looks to have brought the whole thing down.

I noticed that the zero container now sets up on ports 3080 and 4080. Not sure why because the docs say it’s 5080 and 6080.

Setting up grpc listener at: 0.0.0.0:3080
Setting up http listener at: 0.0.0.0:4080

Zero is started using the command
dgraph zero -o -2000 --replicas 3 --my=$(hostname -f):5080 --idx $idx

And server is started using
dgraph server --my=$(hostname -f):7080 --memory_mb 8192 --zero dgraph-0.dgraph.default.svc.cluster.local:5080

And says it’s using these ports

2018/02/13 18:26:39 gRPC server started.  Listening on port 9080
2018/02/13 18:26:39 HTTP server started.  Listening on port 8080

Zero is complaining like this

Echo error from dgraph-2.dgraph.default.svc.cluster.local:7080. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure

And server

Unable to join cluster via dgraphzero
github.com/dgraph-io/dgraph/x.Fatalf
/home/travis/gopath/src/github.com/dgraph-io/dgraph/x/error.go:103
github.com/dgraph-io/dgraph/worker.StartRaftNodes
/home/travis/gopath/src/github.com/dgraph-io/dgraph/worker/groups.go:107
runtime.goexit
/home/travis/.gimme/versions/go1.9.2.linux.amd64/src/runtime/asm_amd64.s:2337

I’m going to try and figure out how to fix this. But this is a warning to anyone else who tries to do this upgrade.

sboorlagadda · February 13, 2018, 6:45pm

Starting from 1.0.3, the default ports have been changed for Dgraph Zero and so that you dont have to provide -0 -2000.

I would say change the zero start command to:

dgraph zero --replicas 3 --my=$(hostname -f):5080 --idx $idx

artooro · February 13, 2018, 8:06pm

Yeah to resolve, I removed the -o option as you mentioned. Basically took a look at https://github.com/dgraph-io/dgraph/blob/master/contrib/config/kubernetes/dgraph-ha.yaml to see what changed from my deployment.

For some reason the change was not applying, so I used kubectl delete pod dgraph-X to delete all pods and the change was then rolled out and now it’s OK.

The port change was in the release notes, so not blaming the dgraph team for anything here.

dtpg · March 8, 2018, 6:46pm

Were you able to get this to work? Im experiencing the same issue right now. Dgraph server complains and shuts off.

artooro · March 8, 2018, 7:07pm

Yes it’s working just fine for me now. Did you see the comment about removing -o -2000 from the statefulset?
Also in my case I had to use kubectl to delete the existing PODs for the changes to take effect.

dtpg · March 9, 2018, 1:09pm

Right, so I was able to get it working when removing port. Sorry I did forget to mention that Im trying to run this locally on mac.

Do we why we can’t then specify the ports manually?

system · April 8, 2018, 1:09pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Error - Unable to join cluster via dgraphzero Users	3	842	January 17, 2018
Getting issue with Zero cluster Dgraph kind:question , dgraph	2	343	January 27, 2021
Cannot get a distributed deployment to work on docker Users	4	830	December 6, 2017
Setting up cluster with docker-compose Dgraph kind:bug , docker	0	700	April 16, 2022
Dgraph cluster setup Dgraph dgraph , cluster , docker	2	530	March 9, 2023

Unable to join cluster via dgraphzero after upgrading to 1.0.3

Related topics