I’m having a problem where the second instance of my Dgraph server is connecting however subsequently failing. I’m using Docker Swarm on AWS (so these are EC2 instances):
version: "3"
networks:
dgraph:
services:
zero:
image: dgraph/dgraph:latest
volumes:
- data-volume:/dgraph
ports:
- 5080:5080
- 6080:6080
networks:
- dgraph
deploy:
placement:
constraints:
- node.hostname == AP-GRAPH-1
command: dgraph zero --my=zero:5080 --replicas 2
server_1:
image: dgraph/dgraph:latest
hostname: "server_1"
volumes:
- data-volume:/dgraph
ports:
- 8080:8080
- 9080:9080
networks:
- dgraph
deploy:
placement:
constraints:
- node.hostname == AP-GRAPH-1
command: dgraph server --my=server_1:7080 --lru_mb=17192 --zero=zero:5080
server_2:
image: dgraph/dgraph:latest
hostname: "server_2"
volumes:
- data-volume:/dgraph
ports:
- 8081:8081
- 9081:9081
networks:
- dgraph
deploy:
placement:
constraints:
- node.hostname == AP-GRAPH-2
command: dgraph server --my=server_2:7081 --lru_mb=17192 --zero=zero:5080 -o 1
ratel:
image: dgraph/dgraph:latest
hostname: "ratel"
ports:
- 8000:8000
networks:
- dgraph
command: dgraph-ratel
deploy:
placement:
constraints:
- node.hostname == AP-GRAPH-1
volumes:
data-volume:
The errors I’m seeing inside the Docker container of server_2 are:
[centos@AP-GRAPH-2 ~]$ docker logs 312fb793bc2a
2018/06/02 15:54:37 groups.go:88: Current Raft Id: 0
2018/06/02 15:54:37 worker.go:99: Worker listening at address: [::]:7081
2018/06/02 15:54:37 gRPC server started. Listening on port 9081
2018/06/02 15:54:37 HTTP server started. Listening on port 8081
2018/06/02 15:54:57 pool.go:158: Echo error from zero:5080. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
2018/06/02 15:54:57 pool.go:108: == CONNECT ==> Setting zero:5080
2018/06/02 15:54:57 groups.go:105: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
2018/06/02 15:55:17 groups.go:105: Error while connecting with group zero: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
2018/06/02 15:55:17 pool.go:158: Echo error from zero:5080. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
2018/06/02 15:55:17 pool.go:158: Echo error from zero:5080. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
Really not sure what’s going on. Note that to mitigate if it’s a security group issue, I’ve allowed ALL ports from anywhere to access the boxes (for now at least).
Note that I’ve tried dgraph:master as well to see if it made a difference. Didn’t seem like it.