HA Cluster Setup

Just been reading the documentation on a HA cluster setup. I’m planning on running a HA cluster on EC2 instances directly on the host with no docker.

Despite seeing people on the forum run 3 instances in total, with alpha and zero both running on one host, this seems pretty counterproductive as if one instance goes down, you’re jeopardizing the availability of both alpha and zero.

My cluster will consist of:
3 zero instances
3 alpha instances
1 ratel
7 instances in total

From what I can see, you can only point an Alpha at one Zero… if that zero goes down, what happens to that alpha instance? Can it still communicate with the other two Zeros? After connecting to one Zero, does it then get the connection info for all of the Zeros? Should the --zero flag actually be a list of zeros so it can round-robin?

1 Like

Check out the HA Cluster Setup in the docs:

From what I can see, you can only point an Alpha at one Zero… if that zero goes down, what happens to that alpha instance? Can it still communicate with the other two Zeros? After connecting to one Zero, does it then get the connection info for all of the Zeros?

In the docs:

The new Alphas will automatically detect each other by communicating with Dgraph zero and establish connections to each other.

I’m not a dgraph expert, but I think it’s not so bad to run with an alpha and zero on the same host because they operate independently (more specifically they have independent HA). But there may be other reasons to avoid that - e.g. resource contention.

In the docs:

The new Alphas will automatically detect each other by communicating with Dgraph zero and establish connections to each other.

This is saying that the alphas are able to communicate with each other (other alphas) by speaking to the Zero instance you specify in the config if I’m not mistaken, I’m talking about the alpha being able to speak to any zero if one goes down.

I’m not a dgraph expert, but I think it’s not so bad to run with an alpha and zero on the same host because they operate independently (more specifically they have independent HA). But there may be other reasons to avoid that - e.g. resource contention.

I agree it’s not so bad, but it would be the same as running two microservices on the same host. You wouldn’t do this for a production environment because as I stated before, despite their being independent HA, if one box goes down, that jeopardizes your HA on two services, both alpha and zero.

You also want to take into account scaling infrastructure. You’ll want to have an autoscaling group with a minimum set to 3, which increases in size if your existing nodes experience high CPU usage. You are already having to use large instances to support both alpha and zero on the same host. Alpha spikes in cpu usage, you now have more instances with both alpha and zero running, when in reality, zero doesn’t need that many instances running to keep up with demand. You’re assigning compute power to zero unnecessarily.

Sorry I misread that. You’re right the documentation doesn’t seem to answer your question. I assume that this is the case, but I’d have to test it to be sure. If I get a chance I’ll test it and let you know.