I’ve managed to get a cluster going with Docker Swarm. It’s setup and operating really well. I have a question regarding setting up my AWS endpoints to leverage the cluster. Should I be creating an ELB and distribute requests to all three instances or should I point my main entry to the master?
I’m not sure how Dgraph distributes itself.
Also regarding dgraph-ratel. I noticed that there’s no constraint in the docker-compose.yaml to have it run on any particular server? I’m wondering how to expose it to a consistent endpoint.
I’m using the docker-compose.yaml file listed on the deploy section of the site: https://docs.dgraph.io/deploy/#using-docker-swarm same exact setup just different hostnames. So basically 3 replicas.
Ok great, I’ll adjust the load balancing to point to all three instances. very cool.
I have an ELB setup to spread the requests around to each instance. I need to configure health checks from the balancer. I believe the health check is available at http://address:8080/health? Only AP-GRAPH-1 responds with that. Looks like the second and third servers are running from 8082 and 8083.
Nightly has bug fixes on top of the last release and should be safe to use. It is also recommended as some bugs were fixed with regards to replicas. We will do a release early next week.
Yeah, second and third servers are running on port 8081 and 8082 as can be seen from the config. Docker swarm doesn’t allow multiple services to use the same port, unfortunately. Kubernetes does a better job at this.
Also side note, when I initialize the cluster with nightly the services don’t seem to go online. They just remain 0/1 not matter how long I wait. Maybe the nightly build requires different configurations?
There is no image named dgraph/dgraph:nightly. The image is dgraph/dgraph:master.
You can inspect the error message by running docker stack ps dgraph --no-trunc.
I changed nightly to master and removed the restart: on-failure line. docker swarm master node automatically handles the restart for you if a container goes down and it should not be part of the config file. The restart key is supported while using docker compose which is different from docker swarm. Here is the updated file.
Ok that worked beautifully. BTW the command for logs is docker service logs <service_name>.
I’m still wondering about setting up a load balancer. Because Docker forces the same service on multiple ports I’m not sure if a typical AWS ELB will be able to point to them individually. I can setup a target group with a port specific to the service but the ELB itself doesn’t seem translate the incoming port 9080 to the specific target port.
I suspect using another proxy method would be the solution but I wanted to ask in case I’m setting the ELB up incorrectly.
Aside from that just a couple points about mistakes on the Deploy section of the website:
In the Ports used by different nodes section the service address is listed as 9090 but everywhere else in the same doc it’s 9080
In the docker section it says to list containers use docker-machine ps should be docker machine ls
I don’t believe the Deploy section mentions a suggestion for setting memory to half the size of the machine’s capacity. It’s listed elsewhere on the docs but would be good for new users to see that right in deploy. Mater of fact explaining why the LRU memory setting is important might be helpful.
Hmmm also noticed that dgraph_server_2 and dgraph_server_3 are throwing errors similar to this:
dgraph_server_2.1.u4kina8qwx3s@AP-GRAPH-2 | 2018/04/16 16:16:24 pool.go:158: Echo error from zero:5080. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgraph_server_2.1.u4kina8qwx3s@AP-GRAPH-2 | 2018/04/16 16:16:27 groups.go:105: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
I don’t think the latest branch was doing this. Are we missing a detail in the configuration you supplied me with?