How to reconnect alpha nodes after a crash if IP address changes?

rmshivers42 · August 28, 2020, 4:34pm

I am working with a dgraph cluster which has 3 alpha nodes and 1 zero node running in Docker swarm.

One of the alpha nodes is crashing due (we think) to an out of memory issue and is being restarted by Docker. Once the alpha node is restarted its IP address changes and the other two alpha nodes cannot reconnect to it. I would think this would not be an issue because the services can use the internal Docker hostnames to connect rather than direct IP addresses, but this does not seem to be the case.

Is there a way to update the IP address of the failed node on the two reconnecting alpha nodes? Or a way to utilize hostnames rather than IP addresses for the connections between alpha nodes?

I am using dgraph version v20.03.4.

Here is an example error message from one of the two alpha nodes that are trying to reconnect to the restarted node.

Error from alpha client subscribe: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 10.0.2.22:7081: connect: connection refused"

MichelDiz · August 28, 2020, 10:17pm

Are u using this yaml? https://raw.githubusercontent.com/dgraph-io/dgraph/master/contrib/config/docker/docker-compose-multi.yml

Docker should resolve the address internally. As alpha1:7080, alpha2:7081, alpha3:7082.

rmshivers42 · September 1, 2020, 4:04pm

Thanks for the reply, after investigating further it looks like after an alpha node fails and restarts its endpoint IP changes but its Docker Virtual IP does not which is what the non-failing alpha nodes are attempting to connect to which is the correct behavior.

The alpha logs never show the reconnection but the alpha nodes’ http health endpoint shows the disconnected alpha node become unhealthy and then returned to healthy when scaled to 0.

So it seems this is not actually an issue, it just seemed like one from the alpha node logs.

Topic		Replies	Views
Mistyped Alpha IP Address on restart Dgraph	0	325	May 12, 2022
All subconns are in TransientFail Dgraph	3	414	August 20, 2020
Dgraph Alpha Node unresponsive Dgraph	10	987	September 10, 2022
Dgraph always shows'Connection lost with alpha: 7080' error Dgraph	3	1008	November 20, 2020
Alpha node down exception Dgraph dgraph	4	356	January 7, 2021

How to reconnect alpha nodes after a crash if IP address changes?

Related Topics