Hi,
I am starting to become increasingly frustrated with Dgraph. I’ve been using it since version 1.0.0 and it seems to me that replication has continuously switched from functioning to be extremely bug or not working every other version.
I understand in v1.0.7 there was an issue with --replicas 3
but I had believed that issue was fixed in one of the RCs. However, in version 1.0.8, I again am running into an issue where my other nodes are not receiving any updates. I feel like I make this post every other version release. Every release I try to move our staging graph over to use multiple nodes and every time it feels as if something goes wrong.
As far as I can tell, there are no errors occurring in any of the logs. As I understand it, I export my data to one node attached to a zero with --replicas 3
and if I add more nodes, those nodes will be synced to the initial node and reach parity (in my experience at roughly 20Mb/s).
Here is my docker-compose.yml
version: '3.4'
volumes:
data-cluster0-node0:
data-cluster0-node1:
data-cluster0-node2:
data-zero:
services:
nginx:
image: 'nginx:1.15.0-alpine'
restart: on-failure
# Change the first port expose the server externally
ports:
- '9083:9080'
- '9084:9081'
- '9085:9082'
volumes:
- ./ssl/:/dgraph/ssl/
- ./grpc.conf:/etc/nginx/conf.d/grpc.conf
sync:
image: 'drafted/sql-to-graph'
restart: on-failure
environment:
- DATABASE_URL=${DATABASE_URL}
- DGRAPH_ZERO=zero:5080
- DGRAPH_SHARDS=1
volumes:
- data-cluster0-node0:/tmp/dgraph-volume-0
cluster0-node0:
command: bash -c 'echo "Sleeping 5 seconds..."; sleep 5; echo "Waiting for sync..."; while ping -c1 sync &>/dev/null; do sleep 1; done; echo "Sync has finished, starting server with ${LRU_MB:-2048}mb" && dgraph server --bindall --my=cluster0-node0:7080 --lru_mb=${LRU_MB:-2048} --zero=zero:5080'
image: 'dgraph/dgraph:v1.0.8'
restart: on-failure
volumes:
- data-cluster0-node0:/dgraph
cluster0-node1:
command: bash -c 'echo "Sleeping 5 seconds..."; sleep 5; echo "Waiting for sync..."; while ping -c1 sync &>/dev/null; do sleep 1; done; sleep 60; echo "Sync has finished, starting server with ${LRU_MB:-2048}mb" && dgraph server --bindall --port_offset=1 --my=cluster0-node1:7081 --lru_mb=${LRU_MB:-2048} --zero=zero:5080'
image: 'dgraph/dgraph:v1.0.8'
restart: on-failure
volumes:
- data-cluster0-node1:/dgraph
cluster0-node2:
command: bash -c 'echo "Sleeping 5 seconds..."; sleep 5; echo "Waiting for sync..."; while ping -c1 sync &>/dev/null; do sleep 1; done; sleep 60; echo "Sync has finished, starting server with ${LRU_MB:-2048}mb" && dgraph server --bindall --port_offset=2 --my=cluster0-node2:7082 --lru_mb=${LRU_MB:-2048} --zero=zero:5080'
image: 'dgraph/dgraph:v1.0.8'
restart: on-failure
volumes:
- data-cluster0-node2:/dgraph
zero:
command: 'dgraph zero --bindall --my=zero:5080 --replicas 3'
image: 'dgraph/dgraph:v1.0.8'
# Change first port to expose the zero server externally (http)
ports:
- '6083:6080'
restart: on-failure
volumes:
- data-zero:/dgraph
My current logs are not showing any syncing at all. My state of my zero:
{
members: {
1: {
id: "1",
groupId: 1,
addr: "cluster0-node0:7080",
leader: true,
lastUpdate: "1536860577"
},
2: {
id: "2",
groupId: 1,
addr: "cluster0-node1:7081"
},
3: {
id: "3",
groupId: 1,
addr: "cluster0-node2:7082"
}
}
}
My logs of my first node:
cluster0-node0_1 | Dgraph version : v1.0.8
cluster0-node0_1 | Commit SHA-1 : 1dd8376f
cluster0-node0_1 | Commit timestamp : 2018-08-31 10:47:07 -0700
cluster0-node0_1 | Branch : HEAD
cluster0-node0_1 |
cluster0-node0_1 | For Dgraph official documentation, visit https://docs.dgraph.io.
cluster0-node0_1 | For discussions about Dgraph , visit http://discuss.dgraph.io.
cluster0-node0_1 | To say hi to the community , visit https://dgraph.slack.com.
cluster0-node0_1 |
cluster0-node0_1 | Licensed under Apache 2.0 + Commons Clause. Copyright 2015-2018 Dgraph Labs, Inc.
cluster0-node0_1 |
cluster0-node0_1 |
cluster0-node0_1 | 2018/09/13 17:42:35 server.go:118: Setting Badger option: ssd
cluster0-node0_1 | 2018/09/13 17:42:35 server.go:134: Setting Badger table load option: mmap
cluster0-node0_1 | 2018/09/13 17:42:35 server.go:147: Setting Badger value log load option: none
cluster0-node0_1 | 2018/09/13 17:42:35 server.go:158: Opening postings Badger DB with options: {Dir:p ValueDir:p SyncWrites:true TableLoadingMode:2 ValueLogLoadingMode:2 NumVersionsToKeep:2147483647 MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:32 NumMemtables:5 NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741824 ValueLogMaxEntries:1000000 NumCompactors:3 managedTxns:false DoNotCompact:false maxBatchCount:0 maxBatchSize:0 ReadOnly:false Truncate:true}
zero_1 | 2018/09/13 17:42:37 zero.go:365: Got connection request: addr:"cluster0-node0:7080"
cluster0-node0_1 | 2018/09/13 17:42:37 gRPC server started. Listening on port 9080
cluster0-node0_1 | 2018/09/13 17:42:37 HTTP server started. Listening on port 8080
cluster0-node0_1 | 2018/09/13 17:42:37 groups.go:80: Current Raft Id: 0
cluster0-node0_1 | 2018/09/13 17:42:37 worker.go:86: Worker listening at address: [::]:7080
cluster0-node0_1 | 2018/09/13 17:42:37 pool.go:108: == CONNECTED ==> Setting zero:5080
zero_1 | 2018/09/13 17:42:37 pool.go:108: == CONNECTED ==> Setting cluster0-node0:7080
cluster0-node0_1 | 2018/09/13 17:42:38 groups.go:107: Connected to group zero. Assigned group: 1
zero_1 | 2018/09/13 17:42:38 zero.go:474: Connected: id:1 group_id:1 addr:"cluster0-node0:7080"
cluster0-node0_1 | E0913 17:42:38.587748 647 storage.go:266] While seekEntry. Error: Unable to find raft entry
cluster0-node0_1 | 2018/09/13 17:42:38 draft.go:76: Node ID: 1 with GroupID: 1
cluster0-node0_1 | 2018/09/13 17:42:38 raft.go:567: INFO: 1 became follower at term 0
cluster0-node0_1 | 2018/09/13 17:42:38 raft.go:315: INFO: newRaft 1 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
cluster0-node0_1 | 2018/09/13 17:42:38 raft.go:567: INFO: 1 became follower at term 1
cluster0-node0_1 | 2018/09/13 17:42:39 raft.go:749: INFO: 1 is starting a new election at term 1
cluster0-node0_1 | 2018/09/13 17:42:39 raft.go:594: INFO: 1 became pre-candidate at term 1
cluster0-node0_1 | 2018/09/13 17:42:39 raft.go:664: INFO: 1 received MsgPreVoteResp from 1 at term 1
cluster0-node0_1 | 2018/09/13 17:42:39 raft.go:580: INFO: 1 became candidate at term 2
cluster0-node0_1 | 2018/09/13 17:42:39 raft.go:664: INFO: 1 received MsgVoteResp from 1 at term 2
cluster0-node0_1 | 2018/09/13 17:42:39 raft.go:621: INFO: 1 became leader at term 2
cluster0-node0_1 | 2018/09/13 17:42:39 node.go:301: INFO: raft.node: 1 elected leader 1 at term 2
cluster0-node0_1 | 2018/09/13 17:42:39 groups.go:361: Serving tablet for: id
cluster0-node0_1 | 2018/09/13 17:42:39 groups.go:361: Serving tablet for: _predicate_
cluster0-node0_1 | 2018/09/13 17:42:39 mutation.go:174: Done schema update predicate:"_predicate_" value_type:STRING list:true
cluster0-node0_1 | 2018/09/13 17:42:42 groups.go:361: Serving tablet for: _job
cluster0-node0_1 | 2018/09/13 17:42:42 groups.go:361: Serving tablet for: name
cluster0-node0_1 | 2018/09/13 17:42:43 groups.go:361: Serving tablet for: when
cluster0-node0_1 | 2018/09/13 17:42:43 groups.go:361: Serving tablet for: _when
cluster0-node0_1 | 2018/09/13 17:42:43 groups.go:361: Serving tablet for: email
cluster0-node0_1 | 2018/09/13 17:42:44 groups.go:361: Serving tablet for: group
cluster0-node0_1 | 2018/09/13 17:42:45 groups.go:361: Serving tablet for: hired
cluster0-node0_1 | 2018/09/13 17:42:45 groups.go:361: Serving tablet for: knows
cluster0-node0_1 | 2018/09/13 17:42:46 groups.go:361: Serving tablet for: match
cluster0-node0_1 | 2018/09/13 17:42:47 groups.go:361: Serving tablet for: owner
cluster0-node0_1 | 2018/09/13 17:42:47 groups.go:361: Serving tablet for: title
cluster0-node0_1 | 2018/09/13 17:42:48 groups.go:361: Serving tablet for: _group
cluster0-node0_1 | 2018/09/13 17:42:49 groups.go:361: Serving tablet for: applied
cluster0-node0_1 | 2018/09/13 17:42:49 groups.go:361: Serving tablet for: name@en
cluster0-node0_1 | 2018/09/13 17:42:49 groups.go:361: Serving tablet for: skipped
cluster0-node0_1 | 2018/09/13 17:42:49 groups.go:361: Serving tablet for: _profile
cluster0-node0_1 | 2018/09/13 17:42:50 groups.go:361: Serving tablet for: attached
cluster0-node0_1 | 2018/09/13 17:42:50 groups.go:361: Serving tablet for: location
cluster0-node0_1 | 2018/09/13 17:42:51 groups.go:361: Serving tablet for: prospect
cluster0-node0_1 | 2018/09/13 17:42:51 groups.go:361: Serving tablet for: title@en
cluster0-node0_1 | 2018/09/13 17:42:51 groups.go:361: Serving tablet for: contacted
cluster0-node0_1 | 2018/09/13 17:42:51 groups.go:361: Serving tablet for: favorited
cluster0-node0_1 | 2018/09/13 17:42:51 groups.go:361: Serving tablet for: seniority
cluster0-node0_1 | 2018/09/13 17:42:52 groups.go:361: Serving tablet for: matchOrder
cluster0-node0_1 | 2018/09/13 17:42:55 groups.go:361: Serving tablet for: companyName
cluster0-node0_1 | 2018/09/13 17:42:56 groups.go:361: Serving tablet for: rawLocation
cluster0-node0_1 | 2018/09/13 17:42:56 groups.go:361: Serving tablet for: recommended
cluster0-node0_1 | 2018/09/13 17:42:56 groups.go:361: Serving tablet for: potentialEmployers
cluster0-node0_1 | 2018/09/13 17:42:57 groups.go:507: Got address of a Zero server: zero:5080
Logs of my 2nd and 3rd node:
cluster0-node1_1 | Sync has finished, starting server with 4096mb
cluster0-node1_1 |
cluster0-node1_1 | Dgraph version : v1.0.8
cluster0-node1_1 | Commit SHA-1 : 1dd8376f
cluster0-node1_1 | Commit timestamp : 2018-08-31 10:47:07 -0700
cluster0-node1_1 | Branch : HEAD
cluster0-node1_1 |
cluster0-node1_1 | For Dgraph official documentation, visit https://docs.dgraph.io.
cluster0-node1_1 | For discussions about Dgraph , visit http://discuss.dgraph.io.
cluster0-node1_1 | To say hi to the community , visit https://dgraph.slack.com.
cluster0-node1_1 |
cluster0-node1_1 | Licensed under Apache 2.0 + Commons Clause. Copyright 2015-2018 Dgraph Labs, Inc.
cluster0-node1_1 |
cluster0-node1_1 |
cluster0-node2_1 |
cluster0-node2_1 | Dgraph version : v1.0.8
cluster0-node2_1 | Commit SHA-1 : 1dd8376f
cluster0-node2_1 | Commit timestamp : 2018-08-31 10:47:07 -0700
cluster0-node2_1 | Branch : HEAD
cluster0-node2_1 |
cluster0-node2_1 | For Dgraph official documentation, visit https://docs.dgraph.io.
cluster0-node2_1 | For discussions about Dgraph , visit http://discuss.dgraph.io.
cluster0-node2_1 | To say hi to the community , visit https://dgraph.slack.com.
cluster0-node2_1 |
cluster0-node2_1 | Licensed under Apache 2.0 + Commons Clause. Copyright 2015-2018 Dgraph Labs, Inc.
cluster0-node2_1 |
cluster0-node2_1 |
cluster0-node1_1 | 2018/09/13 17:43:43 server.go:118: Setting Badger option: ssd
cluster0-node1_1 | 2018/09/13 17:43:43 server.go:134: Setting Badger table load option: mmap
cluster0-node1_1 | 2018/09/13 17:43:43 server.go:147: Setting Badger value log load option: none
cluster0-node1_1 | 2018/09/13 17:43:43 server.go:158: Opening postings Badger DB with options: {Dir:p ValueDir:p SyncWrites:true TableLoadingMode:2 ValueLogLoadingMode:2 NumVersionsToKeep:2147483647 MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:32 NumMemtables:5 NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741824 ValueLogMaxEntries:1000000 NumCompactors:3 managedTxns:false DoNotCompact:false maxBatchCount:0 maxBatchSize:0 ReadOnly:false Truncate:true}
cluster0-node2_1 | 2018/09/13 17:43:43 server.go:118: Setting Badger option: ssd
cluster0-node2_1 | 2018/09/13 17:43:43 server.go:134: Setting Badger table load option: mmap
cluster0-node2_1 | 2018/09/13 17:43:43 server.go:147: Setting Badger value log load option: none
cluster0-node2_1 | 2018/09/13 17:43:43 server.go:158: Opening postings Badger DB with options: {Dir:p ValueDir:p SyncWrites:true TableLoadingMode:2 ValueLogLoadingMode:2 NumVersionsToKeep:2147483647 MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:32 NumMemtables:5 NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741824 ValueLogMaxEntries:1000000 NumCompactors:3 managedTxns:false DoNotCompact:false maxBatchCount:0 maxBatchSize:0 ReadOnly:false Truncate:true}
cluster0-node1_1 | 2018/09/13 17:43:43 gRPC server started. Listening on port 9081
cluster0-node1_1 | 2018/09/13 17:43:43 HTTP server started. Listening on port 8081
cluster0-node1_1 | 2018/09/13 17:43:43 groups.go:80: Current Raft Id: 0
cluster0-node1_1 | 2018/09/13 17:43:43 worker.go:86: Worker listening at address: [::]:7081
zero_1 | 2018/09/13 17:43:43 zero.go:365: Got connection request: addr:"cluster0-node1:7081"
cluster0-node1_1 | 2018/09/13 17:43:43 pool.go:108: == CONNECTED ==> Setting zero:5080
zero_1 | 2018/09/13 17:43:43 pool.go:108: == CONNECTED ==> Setting cluster0-node1:7081
zero_1 | 2018/09/13 17:43:43 zero.go:474: Connected: id:2 group_id:1 addr:"cluster0-node1:7081"
cluster0-node1_1 | 2018/09/13 17:43:43 groups.go:107: Connected to group zero. Assigned group: 1
cluster0-node2_1 | 2018/09/13 17:43:43 groups.go:80: Current Raft Id: 0
cluster0-node2_1 | 2018/09/13 17:43:43 gRPC server started. Listening on port 9082
cluster0-node2_1 | 2018/09/13 17:43:43 HTTP server started. Listening on port 8082
cluster0-node2_1 | 2018/09/13 17:43:43 worker.go:86: Worker listening at address: [::]:7082
cluster0-node0_1 | 2018/09/13 17:43:43 pool.go:108: == CONNECTED ==> Setting cluster0-node1:7081
cluster0-node1_1 | 2018/09/13 17:43:43 pool.go:108: == CONNECTED ==> Setting cluster0-node0:7080
zero_1 | 2018/09/13 17:43:43 zero.go:365: Got connection request: addr:"cluster0-node2:7082"
zero_1 | 2018/09/13 17:43:43 pool.go:108: == CONNECTED ==> Setting cluster0-node2:7082
cluster0-node2_1 | 2018/09/13 17:43:43 pool.go:108: == CONNECTED ==> Setting zero:5080
cluster0-node2_1 | 2018/09/13 17:43:43 groups.go:107: Connected to group zero. Assigned group: 1
cluster0-node1_1 | E0913 17:43:43.638362 646 storage.go:266] While seekEntry. Error: Unable to find raft entry
cluster0-node1_1 | 2018/09/13 17:43:43 draft.go:76: Node ID: 2 with GroupID: 1
cluster0-node1_1 | 2018/09/13 17:43:43 draft.go:923: Calling IsPeer
cluster0-node0_1 | 2018/09/13 17:43:43 pool.go:108: == CONNECTED ==> Setting cluster0-node2:7082
cluster0-node1_1 | 2018/09/13 17:43:43 draft.go:928: Done with IsPeer call
cluster0-node1_1 | 2018/09/13 17:43:43 draft.go:997: Trying to join peers.
cluster0-node1_1 | 2018/09/13 17:43:43 draft.go:906: Calling JoinCluster via leader: cluster0-node0:7080
cluster0-node1_1 | 2018/09/13 17:43:43 draft.go:910: Done with JoinCluster call
cluster0-node1_1 | 2018/09/13 17:43:43 raft.go:567: INFO: 2 became follower at term 0
zero_1 | 2018/09/13 17:43:43 zero.go:474: Connected: id:3 group_id:1 addr:"cluster0-node2:7082"
cluster0-node2_1 | 2018/09/13 17:43:43 pool.go:108: == CONNECTED ==> Setting cluster0-node1:7081
cluster0-node2_1 | 2018/09/13 17:43:43 pool.go:108: == CONNECTED ==> Setting cluster0-node0:7080
cluster0-node1_1 | 2018/09/13 17:43:43 raft.go:315: INFO: newRaft 2 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
cluster0-node1_1 | 2018/09/13 17:43:43 raft.go:567: INFO: 2 became follower at term 1
cluster0-node2_1 | E0913 17:43:43.642357 648 storage.go:266] While seekEntry. Error: Unable to find raft entry
cluster0-node1_1 | 2018/09/13 17:43:43 groups.go:507: Got address of a Zero server: zero:5080
cluster0-node0_1 | 2018/09/13 17:43:43 raft_server.go:166: [2] Done joining cluster with err: <nil>
cluster0-node1_1 | 2018/09/13 17:43:43 pool.go:108: == CONNECTED ==> Setting cluster0-node2:7082
cluster0-node1_1 | 2018/09/13 17:43:43 raft.go:708: INFO: 2 [term: 1] received a MsgApp message with higher term from 1 [term: 2]
cluster0-node1_1 | 2018/09/13 17:43:43 raft.go:567: INFO: 2 became follower at term 2
cluster0-node1_1 | 2018/09/13 17:43:43 node.go:301: INFO: raft.node: 2 elected leader 1 at term 2
cluster0-node1_1 | 2018/09/13 17:43:43 mutation.go:141: Done schema update predicate:"_predicate_" value_type:STRING list:true
cluster0-node2_1 | 2018/09/13 17:43:43 draft.go:76: Node ID: 3 with GroupID: 1
cluster0-node2_1 | 2018/09/13 17:43:43 draft.go:923: Calling IsPeer
cluster0-node2_1 | 2018/09/13 17:43:43 draft.go:928: Done with IsPeer call
cluster0-node2_1 | 2018/09/13 17:43:43 draft.go:997: Trying to join peers.
cluster0-node2_1 | 2018/09/13 17:43:43 draft.go:906: Calling JoinCluster via leader: cluster0-node0:7080
cluster0-node2_1 | 2018/09/13 17:43:43 draft.go:910: Done with JoinCluster call
cluster0-node2_1 | 2018/09/13 17:43:43 raft.go:567: INFO: 3 became follower at term 0
cluster0-node2_1 | 2018/09/13 17:43:43 raft.go:315: INFO: newRaft 3 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
cluster0-node2_1 | 2018/09/13 17:43:43 raft.go:567: INFO: 3 became follower at term 1
cluster0-node2_1 | 2018/09/13 17:43:43 groups.go:507: Got address of a Zero server: zero:5080
cluster0-node0_1 | 2018/09/13 17:43:43 raft_server.go:166: [3] Done joining cluster with err: <nil>
cluster0-node2_1 | 2018/09/13 17:43:43 raft.go:708: INFO: 3 [term: 1] received a MsgHeartbeat message with higher term from 1 [term: 2]
cluster0-node2_1 | 2018/09/13 17:43:43 raft.go:567: INFO: 3 became follower at term 2
cluster0-node2_1 | 2018/09/13 17:43:43 node.go:301: INFO: raft.node: 3 elected leader 1 at term 2
cluster0-node2_1 | 2018/09/13 17:43:43 mutation.go:141: Done schema update predicate:"_predicate_" value_type:STRING list:true
The logs for the 2nd and 3rd node have not changed, that is their entire output.
Maybe I have been setting this up wrong the whole time? I am not sure but I believe I have been following the docs and this used to work (sort of) on previous versions. I’ve been using a single node server for awhile because of these bugs which defeats a huge part of me wanting to use Dgraph.
I’ve asked this before but I used to copy the p
directory over to all 3 nodes so that they started on parity but I was told that I did not need to do that. Is that now required again and only new writes are synced?