Dgraph db cluster in 3 data centers

To facilitate better answering of questions, if you have a question, please fill in the following info. Otherwise, please delete the template.

What I want to do

I’m going launch dgraph in our cloud in 3 data centers. Could you recommend right sets for deploy:

Am I planning right:
9 zero: 3 in DC
3 alpha: 1 in DC

And I’ll get finally 3 replicas and 3 shards in 3 groups.
Should every group be just in one DC or spread among different DCs?

Here prototype of config:
docker-compose.yml (6.2 KB)

1 Like
-o 1 --my=zero2:5080

This should be -o 1 --my=zero2:5081 also change the ports in the docker context. And here --zero=zero1:5080,zero2:5081,zero3:5082
Here too alpha --my=alpha3:7082

This whole docker compose looks wrong… where did you get it?

Not sure what it means.
Based in your yml. You have 3 zero nodes and 9 Alpha nodes. You have set the replica to 3.
Which means 3 shard groups. What is DC? Dgraph Cluster?

different Dgraph Clusters? really not sure what you mean.

Try this
ps. Just change my name to yours in the user path.

version: "3.2"

networks:
  dg_net:
    driver: bridge
    ipam:
      config:
        - subnet: 10.5.0.0/16
          gateway: 10.5.0.1

services:
 zero1:
   image: dgraph/dgraph:v22.0.2
   volumes:
     - /home/micheldiz/dgraph/zero1:/dgraph
   ports:
     - 5081:5080
     - 6081:6080
   restart: on-failure
   networks:
     dg_net:
       ipv4_address: 10.5.0.21
   healthcheck:
     test: curl -sS http://localhost:6080/state | grep -o '10.5.0.21.*?*forceGroupId' | grep -c 'amDead":false' > /dev/null
     interval: 10s
     start_period: 10s
     timeout: 5s
     retries: 7
   command: dgraph zero --my=10.5.0.21:5080 --replicas 3 --raft="idx=1"
 zero2:
   image: dgraph/dgraph:v22.0.2
   volumes:
     - /home/micheldiz/dgraph/zero2:/dgraph
   ports:
     - 5082:5080
     - 6082:6080
   restart: on-failure
   networks:
     dg_net:
       ipv4_address: 10.5.0.22
   healthcheck:
     test: curl -sS http://localhost:6080/state | grep -o '10.5.0.22.*?*forceGroupId' | grep -c 'amDead":false' > /dev/null
     interval: 10s
     start_period: 10s
     timeout: 5s
     retries: 7
   command: dgraph zero --my=10.5.0.22:5080 --replicas 3 --raft="idx=2" --peer 10.5.0.21:5080
 zero3:
   image: dgraph/dgraph:v22.0.2
   volumes:
     - /home/micheldiz/dgraph/zero3:/dgraph
   ports:
     - 5083:5080
     - 6083:6080
   restart: on-failure
   networks:
     dg_net:
       ipv4_address: 10.5.0.23
   healthcheck:
     test: curl -sS http://localhost:6080/state | grep -o '10.5.0.23.*?*forceGroupId' | grep -c 'amDead":false' > /dev/null
     interval: 10s
     start_period: 10s
     timeout: 5s
     retries: 7
   command: dgraph zero --my=10.5.0.23:5080 --replicas 3 --raft="idx=3" --peer 10.5.0.21:5080

 alpha1:
   image: dgraph/dgraph:v22.0.2
   volumes:
     - /home/micheldiz/dgraph/alpha1:/dgraph
   ports:
     - 8081:8080
     - 9081:9080
   restart: on-failure
   networks:
     dg_net:
       ipv4_address: 10.5.0.11
   healthcheck:
     test: curl -sS http://localhost:8080/health | grep -c 'healthy' > /dev/null
     interval: 10s
     start_period: 10s
     timeout: 5s
     retries: 7
   command: dgraph alpha --my=10.5.0.11:7080 --zero=10.5.0.21:5080,10.5.0.22:5080,10.5.0.23:5080
     --security "whitelist=0.0.0.0/0"

 alpha2:
   image: dgraph/dgraph:v22.0.2
   volumes:
     - /home/micheldiz/dgraph/alpha2:/dgraph
   ports:
     - 8082:8080
     - 9082:9080
   restart: on-failure
   networks:
     dg_net:
       ipv4_address: 10.5.0.12
   healthcheck:
     test: curl -sS http://localhost:8080/health | grep -c 'healthy' > /dev/null
     interval: 10s
     start_period: 10s
     timeout: 5s
     retries: 7
   command: dgraph alpha --my=10.5.0.12:7080 --zero=10.5.0.21:5080,10.5.0.22:5080,10.5.0.23:5080
     --security "whitelist=0.0.0.0/0"

 alpha3:
   image: dgraph/dgraph:v22.0.2
   volumes:
     - /home/micheldiz/dgraph/alpha3:/dgraph
   ports:
     - 8083:8080
     - 9083:9080
   restart: on-failure
   networks:
     dg_net:
       ipv4_address: 10.5.0.13
   healthcheck:
     test: curl -sS http://localhost:8080/health | grep -c 'healthy' > /dev/null
     interval: 10s
     start_period: 10s
     timeout: 5s
     retries: 7
   command: dgraph alpha --my=10.5.0.13:7080 --zero=10.5.0.21:5080,10.5.0.22:5080,10.5.0.23:5080
     --security "whitelist=0.0.0.0/0"

 alpha4:
   image: dgraph/dgraph:v22.0.2
   volumes:
     - /home/micheldiz/dgraph/alpha4:/dgraph
   ports:
     - 8084:8080
     - 9084:9080
   restart: on-failure
   networks:
     dg_net:
       ipv4_address: 10.5.0.14
   healthcheck:
     test: curl -sS http://localhost:8080/health | grep -c 'healthy' > /dev/null
     interval: 10s
     start_period: 10s
     timeout: 5s
     retries: 7
   command: dgraph alpha --my=10.5.0.14:7080 --zero=10.5.0.21:5080,10.5.0.22:5080,10.5.0.23:5080
     --security "whitelist=0.0.0.0/0"

 alpha5:
   image: dgraph/dgraph:v22.0.2
   volumes:
     - /home/micheldiz/dgraph/alpha5:/dgraph
   ports:
     - 8085:8080
     - 9085:9080
   restart: on-failure
   networks:
     dg_net:
       ipv4_address: 10.5.0.15
   healthcheck:
     test: curl -sS http://localhost:8080/health | grep -c 'healthy' > /dev/null
     interval: 10s
     start_period: 10s
     timeout: 5s
     retries: 7
   command: dgraph alpha --my=10.5.0.15:7080 --zero=10.5.0.21:5080,10.5.0.22:5080,10.5.0.23:5080
     --security "whitelist=0.0.0.0/0"

 alpha6:
   image: dgraph/dgraph:v22.0.2
   volumes:
     - /home/micheldiz/dgraph/alpha6:/dgraph
   ports:
     - 8086:8080
     - 9086:9080
   restart: on-failure
   networks:
     dg_net:
       ipv4_address: 10.5.0.16
   healthcheck:
     test: curl -sS http://localhost:8080/health | grep -c 'healthy' > /dev/null
     interval: 10s
     start_period: 10s
     timeout: 5s
     retries: 7
   command: dgraph alpha --my=10.5.0.16:7080 --zero=10.5.0.21:5080,10.5.0.22:5080,10.5.0.23:5080
     --security "whitelist=0.0.0.0/0"

 alpha7:
   image: dgraph/dgraph:v22.0.2
   volumes:
     - /home/micheldiz/dgraph/alpha7:/dgraph
   ports:
     - 8087:8080
     - 9087:9080
   restart: on-failure
   networks:
     dg_net:
       ipv4_address: 10.5.0.17
   healthcheck:
     test: curl -sS http://localhost:8080/health | grep -c 'healthy' > /dev/null
     interval: 10s
     start_period: 10s
     timeout: 5s
     retries: 7
   command: dgraph alpha --my=10.5.0.17:7080 --zero=10.5.0.21:5080,10.5.0.22:5080,10.5.0.23:5080
     --security "whitelist=0.0.0.0/0"

 alpha8:
   image: dgraph/dgraph:v22.0.2
   volumes:
     - /home/micheldiz/dgraph/alpha8:/dgraph
   ports:
     - 8088:8080
     - 9088:9080
   restart: on-failure
   networks:
     dg_net:
       ipv4_address: 10.5.0.18
   healthcheck:
     test: curl -sS http://localhost:8080/health | grep -c 'healthy' > /dev/null
     interval: 10s
     start_period: 10s
     timeout: 5s
     retries: 7
   command: dgraph alpha --my=10.5.0.18:7080 --zero=10.5.0.21:5080,10.5.0.22:5080,10.5.0.23:5080
     --security "whitelist=0.0.0.0/0"
 alpha9:
   image: dgraph/dgraph:v22.0.2
   volumes:
     - /home/micheldiz/dgraph/alpha8:/dgraph
   ports:
     - 8089:8080
     - 9089:9080
   restart: on-failure
   networks:
     dg_net:
       ipv4_address: 10.5.0.19
   healthcheck:
     test: curl -sS http://localhost:8080/health | grep -c 'healthy' > /dev/null
     interval: 10s
     start_period: 10s
     timeout: 5s
     retries: 7
   command: dgraph alpha --my=10.5.0.19:7080 --zero=10.5.0.21:5080,10.5.0.22:5080,10.5.0.23:5080
     --security "whitelist=0.0.0.0/0"

 ratel:
    image: dgraph/ratel:latest
    ports:
      - 8000:8000
    networks:
      dg_net:
        ipv4_address: 10.5.0.20
    command: dgraph-ratel

Sorry for mislead: DC it means data center. I planned to spread dgraph cluster among 3 data centers. So that any data will store in 3 replicas. And data should sharded in 3 data centers.

The final aim availability for data in dgraph in case of losing any of data center.

My docker-compose.yml was just for understanding how correct to control for numbers of groups, shards and replicas.

Got it.

Does it mean 9 zeros? That confused me.

I’d put a replica on each Data center.
So, group 1
Alpha-0 goes to Datacenter 1
Alpha-1 goes to Datacenter 2
Alpha-2 goes to Datacenter 3

group 2
Alpha-0 goes to Datacenter 1
Alpha-1 goes to Datacenter 2
Alpha-2 goes to Datacenter 3

group 3
Alpha-0 goes to Datacenter 1
Alpha-1 goes to Datacenter 2
Alpha-2 goes to Datacenter 3

So you have replication across your data centers. You can set this flag

group=; Provides an optional Raft Group ID that this Alpha would indicate to Zero to join.

To force the Alpha into the group.

But the zero group. This one needs to be very “close” to the Alphas in terms of latency. I mean, the leader.

Does it mean 9 zeros? That confused me

Yes, I supposed (apparently wrongly) that for high availability and fault tolerance it would need 3 zero instance per every data center and 1 alpha per data center. In our cloud we have ~tens of thousands instances so it is not a problem to allocate an extra few instances if it could help to add overall performance and reliability.

The zero group will always be a single group. Only Alphas with sharding will have more subgroups in the RAFT algo logic. You don’t need 9 Zeros. That is too much. You can have 1 Zero per Data center. As long it is an odd number. Pay attention that all your Alphas will talk only to the Zero leader. So it should be close related to everyone at least. You can’t have the Zero leader in China for example and the Alphas trying to reach it out from us-east.

1 Like

Thanks a lot. I understood the scheme finally.

P.S. I mixed up zero with alphas. I mean 3 alpha for every data center