I’m trying to evaluate whether running ,multiple alphas in parallel increases read and write throughput.
I found that with pydgraph, running multiple alphas substantially increases the number of queries that can be run in a given time period, but does not increase the number of mutations.
I tried to load an existing ref file using live loader with one versus three alphas, and repeated this many times to try to observe a difference.
Representative values are below:
One alpha: Time spent : 5.122098609s
Three alphas: Time spent : 5.408779932s
What am I missing? Is my ref file too small to result in a difference?
Add a load balancer just as Michel said and performance & latency improves substantially. I got some mind-boggling results no other graph DB came even remotely close to it.
DGraph works internally very differently from many other databases. Setting up a DB cluster in Oracle or Postgres for that matter takes some non-trivial configuration and you come to expect that.
In Dgraph, however, you just dispatch all incoming requests using a load-balancer and your all set. It really is that trivial.
The beauty of that setup becomes a true beast when you place a transparent http-cache such as varnish in front of the load-balancer. That setup gives you constant sub-millisecond response time no matter what you throw at it. Completely bonkers, you have to measure that.
The load balancing capability really needs strong emphasis in the cluster section of the docs because that functionality alone saves you substantial time, effort, and even some headeche.
I think it would be similar. But the second one you have more control, you can dictate what Alpha will receive what. Or you can do your own balancing way. What is important is to not shoot the same Alpha always. To not overload the server, leading to OOM.