Please help me setup HA cluster in Dgraph, no use Docker


(Nguyễn Thanh Phi (Backend MWG)) #1

i have 3 server and i setup cluster like that:

  1. Run on server 1 SERVER1
dgraph zero --my=SERVER1:5080 --replicas 3 --idx 1
dgraph alpha --my=SERVER1:7080 --lru_mb=20000 --zero=SERVER1:5080
  1. Run on server 2 SERVER2
dgraph zero -o 1 --my=SERVER2:5081 --replicas 3 --peer SERVER1:5080 --idx 2
dgraph alpha --my=SERVER2:7081 --lru_mb=20000 --zero=SERVER1:5080 -o 1
  1. Run on server 3 SERVER3
dgraph zero -o 2 --my=SERVER3:5082 --replicas 3 --peer SERVER1:5080 --idx 3
dgraph alpha --my=SERVER3:7082 --lru_mb=20000 --zero=SERVER1:5080 -o 2

I already setup and seem to be the sever did something wrong . Is the command correctly ? please help me


(Michel Conrado) #2

All seems right. What logs do you have that makes you think that something is wrong?


(Nguyễn Thanh Phi (Backend MWG)) #3

I mean repicas=3, while i have 3 server, so will it not effect ? Inject data still slow?


(Nguyễn Thanh Phi (Backend MWG)) #4

If i use ssd disk and more server for cluster, it will be better?


(Michel Conrado) #5

Sorry, I think we have cross conversation here. This get me a bit confused. I need details of what makes you think something is wrong (Based on the proposal of this topic). And now you’re talking about performance and asking about configs. I need context to be able to help you. I can’t help in the dark.

I saw that in another topic you’re using HDD. Well switch to a higher IOPS hardware is always good. No doubt of it.

BUT, if you’re running the whole setup in a single machine simulating a HA Cluster, well you gonna have bad time to extract the maximum performance from Dgraph because you are sharing resources from a single machine and demanding performance. As I don’t have context and you said “no use Docker”, I’ve to assume you’re running like this (HA in single machine). Please confirm if you don’t.

Ideally, each node should be physically isolated if you need a Cluster with the maximum performance(by logic - or at least each node should have its own HDD). Each node needs a significant amount of memory, cores and high IOPS. If in these conditions you find a bottleneck, please let us know.

Running with HDD runs just fine, but does not work miracles. Unless you setup Dgraph to be RAM first. e.g. Let’s say you set the flag “badger.tables” to “RAM” and keep “badger.vlog” defaults “mmap”. You’re setting up Dgraph to be RAM first. This is good if you have a lot of RAM available.

But I think RAM can be more expensive than SSD or NVMe right?

So, do that. Isolate nodes, try to add SSDs or NVMe and redoo the tests.

Cheers.


(Nguyễn Thanh Phi (Backend MWG)) #6

Thank you for your enthusiastic support .
So ,
1.I don’t use Docker to setup, I setup cluster on single machine but 3 machine, not one. 3 machine with private resource, not share resource. But still slow for inject huge data and query too slow…
2. About the RAM , 3 server with 31GB RAM, and over 300GB HDD . I read logs and i see “slow disk”…
3.

=> I don’t know about this flag, so i did not set it. So i’m very grateful to you if give me the tutorial for set this flag in my command
4. For the best perfomance, i will invest money to by SSD. Thank for your support


(Michel Conrado) #7

What parameters are you using to benchmark performance?
And in what comparison is Dgraph slower?

In some cases this log is a false positive. But as you are using HDD should be why it occurs.

Just run

dgraph alpha --my=SERVER1:7080 --lru_mb=20000 --zero=SERVER1:5080 
--badger.tables=ram

(Nguyễn Thanh Phi (Backend MWG)) #8


=> You can see, i query on UI and with simple query (get product by category), but take to 3s, sometime 1s.


=> You can see, this is the slow disk :smiley:


(Michel Conrado) #9

Besides the other things i’ve mention You can also try best-effort queries. Check this link Read-Only Transactions.

Read-only queries can optionally be set as best-effort. Using this flag will ask the Dgraph Alpha to try to get timestamps from memory on a best-effort basis to reduce the number of outbound requests to Zero. This may yield improved latencies in read-bound workloads where linearizable reads are not strictly needed.