Docker Swarm setup

ddibiase · April 12, 2018, 10:13pm

I’ve managed to get a cluster going with Docker Swarm. It’s setup and operating really well. I have a question regarding setting up my AWS endpoints to leverage the cluster. Should I be creating an ELB and distribute requests to all three instances or should I point my main entry to the master?

I’m not sure how Dgraph distributes itself.

Also regarding dgraph-ratel. I noticed that there’s no constraint in the docker-compose.yaml to have it run on any particular server? I’m wondering how to expose it to a consistent endpoint.

Thanks.

pawan · April 13, 2018, 11:04pm

There is no master if multiple nodes are running the same RAFT group you can write and read from all of them so yeah a load balancer should work well.

What is the value of --replicas flag that you are using to run Zero and how many Dgraph servers are you using? I can try and explain based on that.

You could add a constraint and keep it to only one of the nodes or you can run it on all of them, it depends on what you are looking for.

ddibiase · April 14, 2018, 12:05am

I’m using the docker-compose.yaml file listed on the deploy section of the site: https://docs.dgraph.io/deploy/#using-docker-swarm same exact setup just different hostnames. So basically 3 replicas.

Ok great, I’ll adjust the load balancing to point to all three instances. very cool.

Thanks Pawan!

pawan · April 14, 2018, 12:43am

I’d suggest using dgraph/dgraph:nightly image as it has some fixes with regards to replicas.

ddibiase · April 14, 2018, 6:42pm

Thanks Pawan. I’m not comfortable using nightly in production right now so we’ll stay on latest until the next version is available.

I’m reviewing the docker-compose file that is listed on the Deploy section. Here’s my version of it:

version: "3"
networks:
  dgraph:
services:
  zero:
    image: dgraph/dgraph:latest
    volumes:
      - data-volume:/dgraph
    ports:
      - 5080:5080
      - 6080:6080
    networks:
      - dgraph
    deploy:
      placement:
        constraints:
          - node.hostname == AP-GRAPH-1
    command: dgraph zero --my=zero:5080 --replicas 3
  server_1:
    image: dgraph/dgraph:latest
    hostname: "server_1"
    volumes:
      - data-volume:/dgraph
    ports:
      - 8080:8080
      - 9080:9080
    networks:
      - dgraph
    deploy:
      placement:
        constraints:
          - node.hostname == AP-GRAPH-1
    command: dgraph server --my=server_1:7080 --memory_mb=17192 --zero=zero:5080
  server_2:
    image: dgraph/dgraph:latest
    hostname: "server_2"
    volumes:
      - data-volume:/dgraph
    ports:
      - 8081:8081
      - 9081:9081
    networks:
      - dgraph
    deploy:
      placement:
        constraints:
          - node.hostname == AP-GRAPH-2
    command: dgraph server --my=server_2:7081 --memory_mb=17192 --zero=zero:5080 -o 1
  server_3:
    image: dgraph/dgraph:latest
    hostname: "server_3"
    volumes:
      - data-volume:/dgraph
    ports:
      - 8082:8082
      - 9082:9082
    networks:
      - dgraph
    deploy:
      placement:
        constraints:
          - node.hostname == AP-GRAPH-3
    command: dgraph server --my=server_3:7082 --memory_mb=17192 --zero=zero:5080 -o 2
  ratel:
    image: dgraph/dgraph:latest
    hostname: "ratel"
    ports:
      - 8000:8000
    networks:
      - dgraph
    command: dgraph-ratel
volumes:
  data-volume:

I have an ELB setup to spread the requests around to each instance. I need to configure health checks from the balancer. I believe the health check is available at http://address:8080/health? Only AP-GRAPH-1 responds with that. Looks like the second and third servers are running from 8082 and 8083.

Am I taking the right approach here?

pawan · April 15, 2018, 12:37am

Nightly has bug fixes on top of the last release and should be safe to use. It is also recommended as some bugs were fixed with regards to replicas. We will do a release early next week.

Yeah, second and third servers are running on port 8081 and 8082 as can be seen from the config. Docker swarm doesn’t allow multiple services to use the same port, unfortunately. Kubernetes does a better job at this.

ddibiase · April 15, 2018, 1:12am

Hmmmm I’ve pointed my ELB to the following:

AP-GRAPH1 → GET ip:8080/health
AP-GRAPH2 → GET ip:8081/health
AP-GRAPH3 → GET ip:8082/health

Only AP-GRAPH-1 responds with an OK. The other two replicas do not. Is this normal?

I’m going to update to the nightly right now.

ddibiase · April 15, 2018, 1:53am

Also side note, when I initialize the cluster with nightly the services don’t seem to go online. They just remain 0/1 not matter how long I wait. Maybe the nightly build requires different configurations?

pawan · April 15, 2018, 2:36am

memory_mb changed to lru_mb in master. I can checkout about the health check tomorrow.

ddibiase · April 15, 2018, 1:54pm

I just change memory_mb and having the same issue. Here’s my docker file for reference:

version: "3"
networks:
  dgraph:
services:
  zero:
    image: dgraph/dgraph:nightly
    volumes:
      - data-volume:/dgraph
    ports:
      - 5080:5080
      - 6080:6080
    networks:
      - dgraph
    deploy:
      placement:
        constraints:
          - node.hostname == AP-GRAPH-1
    restart: on-failure
    command: dgraph zero --my=zero:5080 --replicas 3
  server_1:
    image: dgraph/dgraph:nightly
    hostname: "server_1"
    volumes:
      - data-volume:/dgraph
    ports:
      - 8080:8080
      - 9080:9080
    networks:
      - dgraph
    deploy:
      placement:
        constraints:
          - node.hostname == AP-GRAPH-1
    restart: on-failure
    command: dgraph server --my=server_1:7080 --lru_mb=17192 --zero=zero:5080
  server_2:
    image: dgraph/dgraph:nightly
    hostname: "server_2"
    volumes:
      - data-volume:/dgraph
    ports:
      - 8081:8081
      - 9081:9081
    networks:
      - dgraph
    deploy:
      placement:
        constraints:
          - node.hostname == AP-GRAPH-2
    restart: on-failure
    command: dgraph server --my=server_2:7081 --lru_mb=17192 --zero=zero:5080 -o 1
  server_3:
    image: dgraph/dgraph:nightly
    hostname: "server_3"
    volumes:
      - data-volume:/dgraph
    ports:
      - 8082:8082
      - 9082:9082
    networks:
      - dgraph
    deploy:
      placement:
        constraints:
          - node.hostname == AP-GRAPH-3
    command: dgraph server --my=server_3:7082 --lru_mb=17192 --zero=zero:5080 -o 2
  ratel:
    image: dgraph/dgraph:nightly
    hostname: "ratel"
    ports:
      - 8000:8000
    networks:
      - dgraph
    restart: on-failure
    command: dgraph-ratel
volumes:
  data-volume:

pawan · April 16, 2018, 2:24am

There is no image named dgraph/dgraph:nightly. The image is dgraph/dgraph:master.

You can inspect the error message by running docker stack ps dgraph --no-trunc.

I changed nightly to master and removed the restart: on-failure line. docker swarm master node automatically handles the restart for you if a container goes down and it should not be part of the config file. The restart key is supported while using docker compose which is different from docker swarm. Here is the updated file.

version: "3"
networks:
  dgraph:
services:
  zero:
    image: dgraph/dgraph:master
    volumes:
      - data-volume:/dgraph
    ports:
      - 5080:5080
      - 6080:6080
    networks:
      - dgraph
    deploy:
      placement:
        constraints:
          - node.hostname == AP-GRAPH-1
    command: dgraph zero --my=zero:5080 --replicas 3
  server_1:
    image: dgraph/dgraph:master
    hostname: "server_1"
    volumes:
      - data-volume:/dgraph
    ports:
      - 8080:8080
      - 9080:9080
    networks:
      - dgraph
    deploy:
      placement:
        constraints:
          - node.hostname == AP-GRAPH-1
    command: dgraph server --my=server_1:7080 --lru_mb=17192 --zero=zero:5080
  server_2:
    image: dgraph/dgraph:master
    hostname: "server_2"
    volumes:
      - data-volume:/dgraph
    ports:
      - 8081:8081
      - 9081:9081
    networks:
      - dgraph
    deploy:
      placement:
        constraints:
          - node.hostname == AP-GRAPH-2
    command: dgraph server --my=server_2:7081 --lru_mb=17192 --zero=zero:5080 -o 1
  server_3:
    image: dgraph/dgraph:master
    hostname: "server_3"
    volumes:
      - data-volume:/dgraph
    ports:
      - 8082:8082
      - 9082:9082
    networks:
      - dgraph
    deploy:
      placement:
        constraints:
          - node.hostname == AP-GRAPH-3
    command: dgraph server --my=server_3:7082 --lru_mb=17192 --zero=zero:5080 -o 2
  ratel:
    image: dgraph/dgraph:master
    hostname: "ratel"
    ports:
      - 8000:8000
    networks:
      - dgraph
    command: dgraph-ratel
volumes:
  data-volume:

You can see all the running services using docker service ls and you can see logs from a service using something like docker logs -f <service_name>.

ddibiase · April 16, 2018, 1:20pm

Ok that worked beautifully. BTW the command for logs is docker service logs <service_name>.

I’m still wondering about setting up a load balancer. Because Docker forces the same service on multiple ports I’m not sure if a typical AWS ELB will be able to point to them individually. I can setup a target group with a port specific to the service but the ELB itself doesn’t seem translate the incoming port 9080 to the specific target port.

I suspect using another proxy method would be the solution but I wanted to ask in case I’m setting the ELB up incorrectly.

Aside from that just a couple points about mistakes on the Deploy section of the website:

In the Ports used by different nodes section the service address is listed as 9090 but everywhere else in the same doc it’s 9080
In the docker section it says to list containers use docker-machine ps should be docker machine ls
I don’t believe the Deploy section mentions a suggestion for setting memory to half the size of the machine’s capacity. It’s listed elsewhere on the docs but would be good for new users to see that right in deploy. Mater of fact explaining why the LRU memory setting is important might be helpful.

Thanks

ddibiase · April 16, 2018, 4:18pm

Hmmm also noticed that dgraph_server_2 and dgraph_server_3 are throwing errors similar to this:

dgraph_server_2.1.u4kina8qwx3s@AP-GRAPH-2    | 2018/04/16 16:16:24 pool.go:158: Echo error from zero:5080. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgraph_server_2.1.u4kina8qwx3s@AP-GRAPH-2    | 2018/04/16 16:16:27 groups.go:105: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure

I don’t think the latest branch was doing this. Are we missing a detail in the configuration you supplied me with?

ddibiase · April 20, 2018, 5:42pm

Hi, just bumping up this request. Still can’t seem to get the other two nodes stable.

pawan · April 22, 2018, 11:41pm

Those are transient errors. I just tried with dgraph/dgraph:latest since we released v1.0.5 and the cluster seems to work fine.

What makes you believe that the other two nodes are not stable? Are they not responding to queries/mutations?

Regarding the ELB, maybe have a look at Docker for Swarm. From the link I can see

Elastic Load Balancers (ELBs) are set up to help with routing traffic to your swarm.

system · May 22, 2018, 11:41pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Issue with Docker swarm Users	3	613	June 24, 2018
Deploying Dgraph to Docker Swarm Users	6	1545	August 13, 2018
Issue with Swarm and second replica failing Users	4	1484	July 19, 2018
Best setup for multiple Dgraph instances? (not on same provider) Dgraph	7	881	July 22, 2020
Cluster setup confusing issues Users	6	477	October 6, 2019

Docker Swarm setup

Related topics