How to reconnect alpha nodes after a crash if IP address changes?

I am working with a dgraph cluster which has 3 alpha nodes and 1 zero node running in Docker swarm.

One of the alpha nodes is crashing due (we think) to an out of memory issue and is being restarted by Docker. Once the alpha node is restarted its IP address changes and the other two alpha nodes cannot reconnect to it. I would think this would not be an issue because the services can use the internal Docker hostnames to connect rather than direct IP addresses, but this does not seem to be the case.

Is there a way to update the IP address of the failed node on the two reconnecting alpha nodes? Or a way to utilize hostnames rather than IP addresses for the connections between alpha nodes?

I am using dgraph version v20.03.4.

Here is an example error message from one of the two alpha nodes that are trying to reconnect to the restarted node.

Error from alpha client subscribe: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp connect: connection refused"

Are u using this yaml?

Docker should resolve the address internally. As alpha1:7080, alpha2:7081, alpha3:7082.

Thanks for the reply, after investigating further it looks like after an alpha node fails and restarts its endpoint IP changes but its Docker Virtual IP does not which is what the non-failing alpha nodes are attempting to connect to which is the correct behavior.

The alpha logs never show the reconnection but the alpha nodes’ http health endpoint shows the disconnected alpha node become unhealthy and then returned to healthy when scaled to 0.

So it seems this is not actually an issue, it just seemed like one from the alpha node logs.