I’ve built a HA cluster in docker swarm and tried to simulate faulire situations.
Cluster is running on three hosts, one zero node and one server node on each. All services have deployment constraints so they won’t move from one node to another. Server ids are set explicitly. Data is saved in named volumes - see docker-compose.yml:
version: "3.4"
services:
zero-1:
image: dgraph/dgraph:v1.0.2
hostname: "zero-1"
command: dgraph zero -o -2000 --my=zero-1:5080 --replicas 3 --idx 1
volumes:
- data:/dgraph
networks:
- dgraph
ports:
- 6080:6080
deploy:
placement:
constraints:
- node.hostname == swarm-manager-1
zero-2:
image: dgraph/dgraph:v1.0.2
hostname: "zero-2"
command: dgraph zero -o -2000 --my=zero-2:5080 --replicas 3 --idx 2 --peer zero-1:5080
volumes:
- data:/dgraph
networks:
- dgraph
deploy:
placement:
constraints:
- node.hostname == swarm-manager-2
zero-3:
image: dgraph/dgraph:v1.0.2
hostname: "zero-3"
command: dgraph zero -o -2000 --my=zero-3:5080 --replicas 3 --idx 3 --peer zero-1:5080
volumes:
- data:/dgraph
networks:
- dgraph
deploy:
placement:
constraints:
- node.hostname == swarm-manager-3
server-1:
image: dgraph/dgraph:v1.0.2
hostname: "server-1"
command: dgraph server --my=server-1:7080 --memory_mb=1568 --zero=zero-1:5080 --export=/dgraph/export
volumes:
- data:/dgraph
networks:
- dgraph
ports:
- 8080:8080
deploy:
replicas: 1
placement:
constraints:
- node.hostname == swarm-manager-1
server-2:
image: dgraph/dgraph:v1.0.2
hostname: "server-2"
command: dgraph server --my=server-2:7080 --memory_mb=1568 --zero=zero-1:5080 --export=/dgraph/export
volumes:
- data:/dgraph
networks:
- dgraph
ports:
- 8081:8080
deploy:
replicas: 1
placement:
constraints:
- node.hostname == swarm-manager-2
server-3:
image: dgraph/dgraph:v1.0.2
hostname: "server-3"
command: dgraph server --my=server-3:7080 --memory_mb=1568 --zero=zero-1:5080 --export=/dgraph/export
volumes:
- data:/dgraph
ports:
- 8082:8080
networks:
- dgraph
deploy:
replicas: 1
placement:
constraints:
- node.hostname == swarm-manager-3
ratel:
image: dgraph/dgraph:v1.0.2
command: dgraph-ratel
networks:
- dgraph
ports:
- 18049:8081
networks:
dgraph:
external: true
volumes:
data:
After deploying a stack some data is added via dgraph live - a subset of 1million.rdf.gz from the tour, about 10k triplets.
So cluster is running and has data in it. Let’s try to simulate a failure of node #3 - all data is wiped, zero and server:
[root@swarm-manager-3 ~]# service docker stop
[root@swarm-manager-3 ~]# rm -rf /var/lib/docker/volumes/dgraph_data/*
[root@swarm-manager-3 ~]# service docker start
After bringing up new zero and server nodes with same ids, hostnames and selfnames, cluster cannot restore itself, because working zero nodes are still trying to reconnect to the wiped node (which doesn’t have RAFT logs anymore):
dgraph_zero-1.1.mswcvo8xjl3i@swarm-manager-1 | 2018/01/25 17:02:29 pool.go:167: Echo error from zero-3:5080. Err: rpc error: code = Unavailable desc = all SubConns are in TransientFailure
dgraph_zero-1.1.mswcvo8xjl3i@swarm-manager-1 | 2018/01/25 17:02:32 raft.go:531: While applying proposal: Invalid address
dgraph_zero-1.1.mswcvo8xjl3i@swarm-manager-1 | 2018/01/25 17:02:36 node.go:322: No healthy connection found to node Id: 3, err: Unhealthy connection
dgraph_zero-1.1.mswcvo8xjl3i@swarm-manager-1 | 2018/01/25 17:02:36 node.go:322: No healthy connection found to node Id: 3, err: Unhealthy connection
dgraph_zero-3.1.zyoxrp21ocjv@swarm-manager-3 | 2018/01/25 16:44:59 pool.go:118: == CONNECT ==> Setting zero-1:5080
dgraph_zero-3.1.zyoxrp21ocjv@swarm-manager-3 | 2018/01/25 16:44:59 raft.go:708: INFO: 3 [term: 1] received a MsgHeartbeat message with higher term from 1 [term: 4]
dgraph_zero-3.1.zz9yqea1plhc@swarm-manager-3 | 2018/01/25 16:42:09 raft.go:708: INFO: 3 [term: 1] received a MsgHeartbeat message with higher term from 1 [term: 4]
dgraph_zero-3.1.zvs2jaxfc6np@swarm-manager-3 | 2018/01/25 15:31:48 raft.go:708: INFO: 3 [term: 1] received a MsgHeartbeat message with higher term from 1 [term: 4]
dgraph_zero-3.1.zvs2jaxfc6np@swarm-manager-3 | 2018/01/25 15:31:48 raft.go:567: INFO: 3 became follower at term 4
dgraph_zero-3.1.zyoxrp21ocjv@swarm-manager-3 | 2018/01/25 16:44:59 raft.go:567: INFO: 3 became follower at term 4
dgraph_zero-3.1.zz9yqea1plhc@swarm-manager-3 | 2018/01/25 16:42:09 raft.go:567: INFO: 3 became follower at term 4
dgraph_zero-3.1.zz9yqea1plhc@swarm-manager-3 | 2018/01/25 16:42:09 logger.go:121: tocommit(150) is out of range [lastIndex(0)]. Was the raft log corrupted, truncated, or lost?
dgraph_zero-3.1.zvs2jaxfc6np@swarm-manager-3 | 2018/01/25 15:31:48 logger.go:121: tocommit(150) is out of range [lastIndex(0)]. Was the raft log corrupted, truncated, or lost?
dgraph_zero-3.1.zyoxrp21ocjv@swarm-manager-3 | 2018/01/25 16:44:59 logger.go:121: tocommit(150) is out of range [lastIndex(0)]. Was the raft log corrupted, truncated, or lost?
dgraph_zero-3.1.zyoxrp21ocjv@swarm-manager-3 | panic: tocommit(150) is out of range [lastIndex(0)]. Was the raft log corrupted, truncated, or lost?
dgraph_zero-3.1.zz9yqea1plhc@swarm-manager-3 | panic: tocommit(150) is out of range [lastIndex(0)]. Was the raft log corrupted, truncated, or lost?
dgraph_zero-3.1.zvs2jaxfc6np@swarm-manager-3 | panic: tocommit(150) is out of range [lastIndex(0)]. Was the raft log corrupted, truncated, or lost?
dgraph_zero-3.1.zvs2jaxfc6np@swarm-manager-3 |
dgraph_zero-3.1.zyoxrp21ocjv@swarm-manager-3 |
dgraph_zero-3.1.zz9yqea1plhc@swarm-manager-3 |
dgraph_zero-3.1.zz9yqea1plhc@swarm-manager-3 | goroutine 155 [running]:
dgraph_zero-3.1.zvs2jaxfc6np@swarm-manager-3 | goroutine 168 [running]:
dgraph_zero-3.1.zyoxrp21ocjv@swarm-manager-3 | goroutine 154 [running]:
dgraph_zero-3.1.zyoxrp21ocjv@swarm-manager-3 | log.(*Logger).Panicf(0xc420066a50, 0x13398f0, 0x5d, 0xc42027b0c0, 0x2, 0x2)
dgraph_zero-3.1.zz9yqea1plhc@swarm-manager-3 | log.(*Logger).Panicf(0xc420066a50, 0x13398f0, 0x5d, 0xc420116900, 0x2, 0x2)
dgraph_zero-3.1.zvs2jaxfc6np@swarm-manager-3 | log.(*Logger).Panicf(0xc420066a50, 0x13398f0, 0x5d, 0xc4202226e0, 0x2, 0x2)
dgraph_zero-3.1.zvs2jaxfc6np@swarm-manager-3 | /usr/local/go/src/log/log.go:219 +0xdb
dgraph_zero-3.1.zyoxrp21ocjv@swarm-manager-3 | /usr/local/go/src/log/log.go:219 +0xdb
dgraph_zero-3.1.zz9yqea1plhc@swarm-manager-3 | /usr/local/go/src/log/log.go:219 +0xdb
dgraph_zero-3.1.zz9yqea1plhc@swarm-manager-3 | github.com/dgraph-io/dgraph/vendor/github.com/coreos/etcd/raft.(*DefaultLogger).Panicf(0xc42028b680, 0x13398f0, 0x5d, 0xc420116900, 0x2, 0x2)
dgraph_zero-3.1.zvs2jaxfc6np@swarm-manager-3 | github.com/dgraph-io/dgraph/vendor/github.com/coreos/etcd/raft.(*DefaultLogger).Panicf(0xc420289690, 0x13398f0, 0x5d, 0xc4202226e0, 0x2, 0x2)
dgraph_zero-3.1.zyoxrp21ocjv@swarm-manager-3 | github.com/dgraph-io/dgraph/vendor/github.com/coreos/etcd/raft.(*DefaultLogger).Panicf(0xc42028b670, 0x13398f0, 0x5d, 0xc42027b0c0, 0x2, 0x2)
dgraph_zero-3.1.zyoxrp21ocjv@swarm-manager-3 | /home/pawan/go/src/github.com/dgraph-io/dgraph/vendor/github.com/coreos/etcd/raft/logger.go:121 +0x60
dgraph_zero-3.1.zz9yqea1plhc@swarm-manager-3 | /home/pawan/go/src/github.com/dgraph-io/dgraph/vendor/github.com/coreos/etcd/raft/logger.go:121 +0x60
dgraph_zero-3.1.zvs2jaxfc6np@swarm-manager-3 | /home/pawan/go/src/github.com/dgraph-io/dgraph/vendor/github.com/coreos/etcd/raft/logger.go:121 +0x60
Zero-3 is present in the list of zero nodes (and will be there even if it goes down). Output of /state
:
{"counter":"2333","groups":{"1":{"members":{"1":{"id":"1","groupId":1,"addr":"server-3:7080","lastUpdate":"1516890725"},"2":{"id":"2","groupId":1,"addr":"server-2:7080","leader":true,"lastUpdate":"1516890868"},"3":{"id":"3","groupId":1,"addr":"server-1:7080"}},"tablets":{"_predicate_":{"groupId":1,"predicate":"_predicate_","space":"6737282"},"actor.film":{"groupId":1,"predicate":"actor.film","space":"154599"},"director.film":{"groupId":1,"predicate":"director.film","space":"9785"},"genre":{"groupId":1,"predicate":"genre","space":"24307"},"initial_release_date":{"groupId":1,"predicate":"initial_release_date","space":"25351"},"name":{"groupId":1,"predicate":"name","space":"6304250"},"performance.actor":{"groupId":1,"predicate":"performance.actor","space":"189068"},"performance.character":{"groupId":1,"predicate":"performance.character","space":"206256"},"performance.film":{"groupId":1,"predicate":"performance.film","space":"184814"},"starring":{"groupId":1,"predicate":"starring","space":"75475"}}}},"zeros":{"1":{"id":"1","addr":"zero-1:5080","leader":true},"2":{"id":"2","addr":"zero-2:5080"},"3":{"id":"3","addr":"zero-3:5080"}},"maxLeaseId":"1010000","maxTxnTs":"10000","maxRaftId":"2172"}
More logs, stopped docker service on host with server-3/zero-3, purged data in the volume, removed server-3 via /removeNode
, removed the whole stack (docker stack rm dgraph
), added zero-4 and server-4 to the docker-compose (on the same host where server-3/zero-3 used to be), started docker on third host and deployed the stack again. In this situation all server nodes crash constantly - even the working ones:
dgraph_server-1.1.twco76veqqbh@swarm-manager-1 | 2018/01/26 08:37:50 gRPC server started. Listening on port 9080
dgraph_server-1.1.twco76veqqbh@swarm-manager-1 | 2018/01/26 08:37:50 HTTP server started. Listening on port 8080
dgraph_server-1.1.twco76veqqbh@swarm-manager-1 | 2018/01/26 08:37:50 groups.go:86: Current Raft Id: 2
dgraph_server-1.1.twco76veqqbh@swarm-manager-1 | 2018/01/26 08:37:50 worker.go:99: Worker listening at address: [::]:7080
dgraph_server-1.1.twco76veqqbh@swarm-manager-1 | 2018/01/26 08:37:50 pool.go:118: == CONNECT ==> Setting zero-1:5080
dgraph_server-1.1.twco76veqqbh@swarm-manager-1 | 2018/01/26 08:37:50 groups.go:109: Connected to group zero. Connection state: member:<id:2 addr:"server-1:7080" > state:<counter:432 groups:<key:1 value:<members:<key:2 value:<id:2 group_id:1 addr:"server-1:7080" > > members:<key:3 value:<id:3 group_id:1 addr:"server-2:7080" leader:true last_update:1516955215 > > members:<key:32 value:<id:32 group_id:1 addr:"server-3:7080" > > tablets:<key:"_predicate_" value:<group_id:1 predicate:"_predicate_" space:6737282 > > tablets:<key:"actor.film" value:<group_id:1 predicate:"actor.film" space:154599 > > tablets:<key:"director.film" value:<group_id:1 predicate:"director.film" space:32420 > > tablets:<key:"genre" value:<group_id:1 predicate:"genre" space:35893 > > tablets:<key:"initial_release_date" value:<group_id:1 predicate:"initial_release_date" space:31594 > > tablets:<key:"name" value:<group_id:1 predicate:"name" space:9925639 > > tablets:<key:"performance.actor" value:<group_id:1 predicate:"performance.actor" space:189068 > > tablets:<key:"performance.character" value:<group_id:1 predicate:"performance.character" space:206256 > > tablets:<key:"performance.film" value:<group_id:1 predicate:"performance.film" space:184814 > > tablets:<key:"starring" value:<group_id:1 predicate:"starring" space:80342 > > > > groups:<key:2 value:<members:<key:59 value:<id:59 group_id:2 addr:"server-4:7080" > > > > zeros:<key:1 value:<id:1 addr:"zero-1:5080" leader:true > > zeros:<key:2 value:<id:2 addr:"zero-2:5080" > > zeros:<key:3 value:<id:3 addr:"zero-4:5080" > > maxLeaseId:1010000 maxTxnTs:20000 maxRaftId:86 removed:<id:1 group_id:1 addr:"server-3:7080" last_update:1516954819 > >
dgraph_server-1.1.twco76veqqbh@swarm-manager-1 | panic: runtime error: invalid memory address or nil pointer dereference
dgraph_server-1.1.twco76veqqbh@swarm-manager-1 | [signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x1097b45]
dgraph_server-1.1.twco76veqqbh@swarm-manager-1 |
dgraph_server-1.1.twco76veqqbh@swarm-manager-1 | goroutine 248 [running]:
dgraph_server-1.1.twco76veqqbh@swarm-manager-1 | github.com/dgraph-io/dgraph/worker.(*groupi).applyState(0xc420116000, 0xc4257c28c0)
dgraph_server-1.1.twco76veqqbh@swarm-manager-1 | /home/pawan/go/src/github.com/dgraph-io/dgraph/worker/groups.go:245 +0x545
dgraph_server-1.1.twco76veqqbh@swarm-manager-1 | github.com/dgraph-io/dgraph/worker.StartRaftNodes(0xc4203d4010, 0x1)
dgraph_server-1.1.twco76veqqbh@swarm-manager-1 | /home/pawan/go/src/github.com/dgraph-io/dgraph/worker/groups.go:111 +0x58a
dgraph_server-1.1.twco76veqqbh@swarm-manager-1 | created by github.com/dgraph-io/dgraph/dgraph/cmd/server.run
dgraph_server-1.1.twco76veqqbh@swarm-manager-1 | /home/pawan/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/server/run.go:351 +0x82b
Is there a way to safely replace zero node if it unexpectedly leaves the cluster forever? Should we backup RAFT logs to be able to restore the node? Do replacement nodes need to have new ids/hostnames/whatever? Server nodes have /removeNode
endpoint, but zeros do not.