Issue Connecting Dgraph Alpha Instance to Zero Leader Across Different Servers

What I want to do
I have set up Dgraph Zero and Dgraph Alpha on one instance (xxx.xxx.x.27), and another Dgraph Alpha instance on a different server (xxx.xxx.x.28). I want to connect the Alpha instance on the second server (xxx.xxx.x.28) to the Zero service running on the first server (xxx.xxx.x.27:5080).

What I did

1.Set up Dgraph Zero and Alpha** on the instance xxx.xxx.x.27.
2.Zero is configured to listen on xxx.xxx.x.27:5080.
3. Alpha is configured to connect to the Zero instance at xxx.xxx.x.27:5080.
4.Set up another Dgraph Alpha** instance on a separate server (xxx.xxx.x.28), configured to connect to the Zero instance on xxx.xxx.x.27:5080.

Zero Service Configuration (xxx.xxx.x.27):

ExecStart=/usr/local/bin/dgraph zero --my=xxx.xxx.x.27:5080 --replicas=1 --wal /var/lib/dgraph/zw

Alpha Service Configuration on another VM(xxx.xxx.x.28):;

ExecStart=/usr/local/bin/dgraph alpha --my=xxx.xxx.x.28:7080 --zero=xxx.xxx.x.27:5080 --logtostderr -v=2 -p /var/lib/dgraph/p -w /var/lib/dgraph/w --port_offset=8180

Error :
When trying to connect the Alpha instance from xxx.xxx.x.28 to Zero at xxx.xxx.x.27:5080, I get the following error:

Oct 14 11:10:54 AI-ML18 dgraph[536169]: I1014 11:10:54.738624 536182 groups.go:750] Found connection to leader: localhost:5080
Oct 14 11:10:54 AI-ML18 dgraph[536169]: I1014 11:10:54.739088 536182 groups.go:704] No healthy Zero leader found. Trying to find a Zero leader…
Oct 14 11:10:54 AI-ML18 dgraph[536169]: E1014 11:10:54.752493 536182 groups.go:1229] Error during SubscribeForUpdates for prefix “\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00”: unable to find any servers for group: 1. closer err:
Oct 14 11:10:54 AI-ML18 dgraph[536169]: I1014 11:10:54.840109 536182 run.go:786] Caught Ctrl-C. Terminating now (this may take a few seconds)…
Oct 14 11:10:54 AI-ML18 dgraph[536169]: I1014 11:10:54.851876 536182 run.go:791] Stopped before initialization completed
Oct 14 11:10:54 AI-ML18 dgraph[536169]: I1014 11:10:54.854678 536182 groups.go:750] Found connection to leader: localhost:5080
Oct 14 11:10:54 AI-ML18 dgraph[536169]: I1014 11:10:54.854742 536182 groups.go:704] No healthy Zero leader found. Trying to find a Zero leader…

What could be causing the Alpha instance on xxx.xxx.x.28 to not find a healthy Zero leader, even though it connects? Are there any specific configurations or settings I should adjust to ensure proper connection and healthy leader discovery between Alpha on xxx.xxx.x.28 and Zero on xxx.xxx.x.27? Do I need to configure anything on the Zero instance to support connecting Alphas from multiple servers?

Can anyone help me resolve this issue?

Your port offset looks a little bit strange. It means an offset (a realation) not the port itself which should be used.

For example, when a user runs Dgraph Alpha with the --port_offset 2 setting, then the Alpha node binds to port 7082 (gRPC-internal-private), 8082 (HTTP-external-public) and 9082 (gRPC-external-public), respectively.

However from which node are the logs you are provided? If they are from the zero node there might be a problem with the command arguments because it tries to connect to the zero via “localhost:5080” on the alpha. There it wouldn’t get any response from localhost because the zero node runs on another host as you explained.

Hi @sivak, we’ve also provided a resolution for this on the GH issue created as there’s definitely a problem with the port_offset configuration as per your note. More importantly, there should be no need to even use port_offset if the second Alpha was setup on a different host, with no potential conflict on ports 7080 or 9080 due to a different service also listening to on the same ports.

Additionally, as @mike42 noted above, you may also want to inspect the Alpha startup command as the logs indicate the Zero address provided as localhost:5080 instead of xxx.xxx.xxx.27:5080, (unless the Zero and the second Alpha are also running on the same host).

We’ve closed the issue as of now since this is a config issue and not really a bug.
HTH!