Multiple alphas and read/write throughput

I’m trying to evaluate whether running ,multiple alphas in parallel increases read and write throughput.

I found that with pydgraph, running multiple alphas substantially increases the number of queries that can be run in a given time period, but does not increase the number of mutations.

I tried to load an existing ref file using live loader with one versus three alphas, and repeated this many times to try to observe a difference.

Representative values are below:
One alpha: Time spent : 5.122098609s
Three alphas: Time spent : 5.408779932s

What am I missing? Is my ref file too small to result in a difference?

Thanks!

Hey, can you try using this config?

Just increase the number of instances you need and add it to Nginx too.

Cheers.

@kashesha

Add a load balancer just as Michel said and performance & latency improves substantially. I got some mind-boggling results no other graph DB came even remotely close to it.

DGraph works internally very differently from many other databases. Setting up a DB cluster in Oracle or Postgres for that matter takes some non-trivial configuration and you come to expect that.

In Dgraph, however, you just dispatch all incoming requests using a load-balancer and your all set. It really is that trivial.

The beauty of that setup becomes a true beast when you place a transparent http-cache such as varnish in front of the load-balancer. That setup gives you constant sub-millisecond response time no matter what you throw at it. Completely bonkers, you have to measure that.

The load balancing capability really needs strong emphasis in the cluster section of the docs because that functionality alone saves you substantial time, effort, and even some headeche.

1 Like

Thanks! @MichelDiz @marvin-hansen

Hey, I have a conceptual question about this topic.

Let’s consider two alternative set-ups. The first one is the config that @MichelDiz provided, that uses an nginx load balancer.

Let’s say I connect to this cluster using a Go or Python client like so:

client = pydgraph.DgraphClient(pydgraph.DgraphClientStub(‘localhost:9080’))

Now in the other set-up, I don’t use this config, but just have three zeros and three alphas, with no nginx load balancing.

Let’s say I run three different writers, one for each alpha, like so:

client1 = pydgraph.DgraphClient(pydgraph.DgraphClientStub(‘localhost:9080’))
client2 = pydgraph.DgraphClient(pydgraph.DgraphClientStub(‘localhost:9081’))
client3 = pydgraph.DgraphClient(pydgraph.DgraphClientStub(‘localhost:9082’))

and write from all of them simultaneously.

Which set-up do you expect to perform better?

I think it would be similar. But the second one you have more control, you can dictate what Alpha will receive what. Or you can do your own balancing way. What is important is to not shoot the same Alpha always. To not overload the server, leading to OOM.