I am working with a dgraph cluster which has 3 alpha nodes and 1 zero node running in Docker swarm.
One of the alpha nodes is crashing due (we think) to an out of memory issue and is being restarted by Docker. Once the alpha node is restarted its IP address changes and the other two alpha nodes cannot reconnect to it. I would think this would not be an issue because the services can use the internal Docker hostnames to connect rather than direct IP addresses, but this does not seem to be the case.
Is there a way to update the IP address of the failed node on the two reconnecting alpha nodes? Or a way to utilize hostnames rather than IP addresses for the connections between alpha nodes?
I am using dgraph version v20.03.4.
Here is an example error message from one of the two alpha nodes that are trying to reconnect to the restarted node.
Error from alpha client subscribe: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 10.0.2.22:7081: connect: connection refused"