V1.0.8 issues with replication on 3 node cluster

emhagman · September 13, 2018, 5:55pm

Hi,

I am starting to become increasingly frustrated with Dgraph. I’ve been using it since version 1.0.0 and it seems to me that replication has continuously switched from functioning to be extremely bug or not working every other version.

I understand in v1.0.7 there was an issue with --replicas 3 but I had believed that issue was fixed in one of the RCs. However, in version 1.0.8, I again am running into an issue where my other nodes are not receiving any updates. I feel like I make this post every other version release. Every release I try to move our staging graph over to use multiple nodes and every time it feels as if something goes wrong.

As far as I can tell, there are no errors occurring in any of the logs. As I understand it, I export my data to one node attached to a zero with --replicas 3 and if I add more nodes, those nodes will be synced to the initial node and reach parity (in my experience at roughly 20Mb/s).

Here is my docker-compose.yml

version: '3.4'

volumes:
  data-cluster0-node0:
  data-cluster0-node1:
  data-cluster0-node2:
  data-zero:

services:
  nginx:
    image: 'nginx:1.15.0-alpine'
    restart: on-failure
    # Change the first port expose the server externally
    ports:
      - '9083:9080'
      - '9084:9081'
      - '9085:9082'
    volumes:
      - ./ssl/:/dgraph/ssl/
      - ./grpc.conf:/etc/nginx/conf.d/grpc.conf
  sync:
    image: 'drafted/sql-to-graph'
    restart: on-failure
    environment:
      - DATABASE_URL=${DATABASE_URL}
      - DGRAPH_ZERO=zero:5080
      - DGRAPH_SHARDS=1
    volumes:
      - data-cluster0-node0:/tmp/dgraph-volume-0
  cluster0-node0:
    command: bash -c 'echo "Sleeping 5 seconds..."; sleep 5; echo "Waiting for sync..."; while ping -c1 sync &>/dev/null; do sleep 1; done; echo "Sync has finished, starting server with ${LRU_MB:-2048}mb" && dgraph server --bindall --my=cluster0-node0:7080 --lru_mb=${LRU_MB:-2048} --zero=zero:5080'
    image: 'dgraph/dgraph:v1.0.8'
    restart: on-failure
    volumes:
      - data-cluster0-node0:/dgraph
  cluster0-node1:
    command: bash -c 'echo "Sleeping 5 seconds..."; sleep 5; echo "Waiting for sync..."; while ping -c1 sync &>/dev/null; do sleep 1; done; sleep 60; echo "Sync has finished, starting server with ${LRU_MB:-2048}mb" && dgraph server --bindall --port_offset=1 --my=cluster0-node1:7081 --lru_mb=${LRU_MB:-2048} --zero=zero:5080'
    image: 'dgraph/dgraph:v1.0.8'
    restart: on-failure
    volumes:
      - data-cluster0-node1:/dgraph
  cluster0-node2:
    command: bash -c 'echo "Sleeping 5 seconds..."; sleep 5; echo "Waiting for sync..."; while ping -c1 sync &>/dev/null; do sleep 1; done; sleep 60; echo "Sync has finished, starting server with ${LRU_MB:-2048}mb" && dgraph server --bindall --port_offset=2 --my=cluster0-node2:7082 --lru_mb=${LRU_MB:-2048} --zero=zero:5080'
    image: 'dgraph/dgraph:v1.0.8'
    restart: on-failure
    volumes:
      - data-cluster0-node2:/dgraph
  zero:
    command: 'dgraph zero --bindall --my=zero:5080 --replicas 3'
    image: 'dgraph/dgraph:v1.0.8'
    # Change first port to expose the zero server externally (http)
    ports:
      - '6083:6080'
    restart: on-failure
    volumes:
      - data-zero:/dgraph

My current logs are not showing any syncing at all. My state of my zero:

{
	members: {
		1: {
			id: "1",
			groupId: 1,
			addr: "cluster0-node0:7080",
			leader: true,
			lastUpdate: "1536860577"
		},
		2: {
			id: "2",
			groupId: 1,
			addr: "cluster0-node1:7081"
		},
		3: {
			id: "3",
			groupId: 1,
			addr: "cluster0-node2:7082"
		}
	}
}

My logs of my first node:

cluster0-node0_1  | Dgraph version   : v1.0.8
cluster0-node0_1  | Commit SHA-1     : 1dd8376f
cluster0-node0_1  | Commit timestamp : 2018-08-31 10:47:07 -0700
cluster0-node0_1  | Branch           : HEAD
cluster0-node0_1  |
cluster0-node0_1  | For Dgraph official documentation, visit https://docs.dgraph.io.
cluster0-node0_1  | For discussions about Dgraph     , visit http://discuss.dgraph.io.
cluster0-node0_1  | To say hi to the community       , visit https://dgraph.slack.com.
cluster0-node0_1  |
cluster0-node0_1  | Licensed under Apache 2.0 + Commons Clause. Copyright 2015-2018 Dgraph Labs, Inc.
cluster0-node0_1  |
cluster0-node0_1  |
cluster0-node0_1  | 2018/09/13 17:42:35 server.go:118: Setting Badger option: ssd
cluster0-node0_1  | 2018/09/13 17:42:35 server.go:134: Setting Badger table load option: mmap
cluster0-node0_1  | 2018/09/13 17:42:35 server.go:147: Setting Badger value log load option: none
cluster0-node0_1  | 2018/09/13 17:42:35 server.go:158: Opening postings Badger DB with options: {Dir:p ValueDir:p SyncWrites:true TableLoadingMode:2 ValueLogLoadingMode:2 NumVersionsToKeep:2147483647 MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:32 NumMemtables:5 NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741824 ValueLogMaxEntries:1000000 NumCompactors:3 managedTxns:false DoNotCompact:false maxBatchCount:0 maxBatchSize:0 ReadOnly:false Truncate:true}
zero_1            | 2018/09/13 17:42:37 zero.go:365: Got connection request: addr:"cluster0-node0:7080"
cluster0-node0_1  | 2018/09/13 17:42:37 gRPC server started.  Listening on port 9080
cluster0-node0_1  | 2018/09/13 17:42:37 HTTP server started.  Listening on port 8080
cluster0-node0_1  | 2018/09/13 17:42:37 groups.go:80: Current Raft Id: 0
cluster0-node0_1  | 2018/09/13 17:42:37 worker.go:86: Worker listening at address: [::]:7080
cluster0-node0_1  | 2018/09/13 17:42:37 pool.go:108: == CONNECTED ==> Setting zero:5080
zero_1            | 2018/09/13 17:42:37 pool.go:108: == CONNECTED ==> Setting cluster0-node0:7080
cluster0-node0_1  | 2018/09/13 17:42:38 groups.go:107: Connected to group zero. Assigned group: 1
zero_1            | 2018/09/13 17:42:38 zero.go:474: Connected: id:1 group_id:1 addr:"cluster0-node0:7080"
cluster0-node0_1  | E0913 17:42:38.587748     647 storage.go:266] While seekEntry. Error: Unable to find raft entry
cluster0-node0_1  | 2018/09/13 17:42:38 draft.go:76: Node ID: 1 with GroupID: 1
cluster0-node0_1  | 2018/09/13 17:42:38 raft.go:567: INFO: 1 became follower at term 0
cluster0-node0_1  | 2018/09/13 17:42:38 raft.go:315: INFO: newRaft 1 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
cluster0-node0_1  | 2018/09/13 17:42:38 raft.go:567: INFO: 1 became follower at term 1
cluster0-node0_1  | 2018/09/13 17:42:39 raft.go:749: INFO: 1 is starting a new election at term 1
cluster0-node0_1  | 2018/09/13 17:42:39 raft.go:594: INFO: 1 became pre-candidate at term 1
cluster0-node0_1  | 2018/09/13 17:42:39 raft.go:664: INFO: 1 received MsgPreVoteResp from 1 at term 1
cluster0-node0_1  | 2018/09/13 17:42:39 raft.go:580: INFO: 1 became candidate at term 2
cluster0-node0_1  | 2018/09/13 17:42:39 raft.go:664: INFO: 1 received MsgVoteResp from 1 at term 2
cluster0-node0_1  | 2018/09/13 17:42:39 raft.go:621: INFO: 1 became leader at term 2
cluster0-node0_1  | 2018/09/13 17:42:39 node.go:301: INFO: raft.node: 1 elected leader 1 at term 2
cluster0-node0_1  | 2018/09/13 17:42:39 groups.go:361: Serving tablet for: id
cluster0-node0_1  | 2018/09/13 17:42:39 groups.go:361: Serving tablet for: _predicate_
cluster0-node0_1  | 2018/09/13 17:42:39 mutation.go:174: Done schema update predicate:"_predicate_" value_type:STRING list:true
cluster0-node0_1  | 2018/09/13 17:42:42 groups.go:361: Serving tablet for: _job
cluster0-node0_1  | 2018/09/13 17:42:42 groups.go:361: Serving tablet for: name
cluster0-node0_1  | 2018/09/13 17:42:43 groups.go:361: Serving tablet for: when
cluster0-node0_1  | 2018/09/13 17:42:43 groups.go:361: Serving tablet for: _when
cluster0-node0_1  | 2018/09/13 17:42:43 groups.go:361: Serving tablet for: email
cluster0-node0_1  | 2018/09/13 17:42:44 groups.go:361: Serving tablet for: group
cluster0-node0_1  | 2018/09/13 17:42:45 groups.go:361: Serving tablet for: hired
cluster0-node0_1  | 2018/09/13 17:42:45 groups.go:361: Serving tablet for: knows
cluster0-node0_1  | 2018/09/13 17:42:46 groups.go:361: Serving tablet for: match
cluster0-node0_1  | 2018/09/13 17:42:47 groups.go:361: Serving tablet for: owner
cluster0-node0_1  | 2018/09/13 17:42:47 groups.go:361: Serving tablet for: title
cluster0-node0_1  | 2018/09/13 17:42:48 groups.go:361: Serving tablet for: _group
cluster0-node0_1  | 2018/09/13 17:42:49 groups.go:361: Serving tablet for: applied
cluster0-node0_1  | 2018/09/13 17:42:49 groups.go:361: Serving tablet for: name@en
cluster0-node0_1  | 2018/09/13 17:42:49 groups.go:361: Serving tablet for: skipped
cluster0-node0_1  | 2018/09/13 17:42:49 groups.go:361: Serving tablet for: _profile
cluster0-node0_1  | 2018/09/13 17:42:50 groups.go:361: Serving tablet for: attached
cluster0-node0_1  | 2018/09/13 17:42:50 groups.go:361: Serving tablet for: location
cluster0-node0_1  | 2018/09/13 17:42:51 groups.go:361: Serving tablet for: prospect
cluster0-node0_1  | 2018/09/13 17:42:51 groups.go:361: Serving tablet for: title@en
cluster0-node0_1  | 2018/09/13 17:42:51 groups.go:361: Serving tablet for: contacted
cluster0-node0_1  | 2018/09/13 17:42:51 groups.go:361: Serving tablet for: favorited
cluster0-node0_1  | 2018/09/13 17:42:51 groups.go:361: Serving tablet for: seniority
cluster0-node0_1  | 2018/09/13 17:42:52 groups.go:361: Serving tablet for: matchOrder
cluster0-node0_1  | 2018/09/13 17:42:55 groups.go:361: Serving tablet for: companyName
cluster0-node0_1  | 2018/09/13 17:42:56 groups.go:361: Serving tablet for: rawLocation
cluster0-node0_1  | 2018/09/13 17:42:56 groups.go:361: Serving tablet for: recommended
cluster0-node0_1  | 2018/09/13 17:42:56 groups.go:361: Serving tablet for: potentialEmployers
cluster0-node0_1  | 2018/09/13 17:42:57 groups.go:507: Got address of a Zero server: zero:5080

Logs of my 2nd and 3rd node:

cluster0-node1_1  | Sync has finished, starting server with 4096mb
cluster0-node1_1  |
cluster0-node1_1  | Dgraph version   : v1.0.8
cluster0-node1_1  | Commit SHA-1     : 1dd8376f
cluster0-node1_1  | Commit timestamp : 2018-08-31 10:47:07 -0700
cluster0-node1_1  | Branch           : HEAD
cluster0-node1_1  |
cluster0-node1_1  | For Dgraph official documentation, visit https://docs.dgraph.io.
cluster0-node1_1  | For discussions about Dgraph     , visit http://discuss.dgraph.io.
cluster0-node1_1  | To say hi to the community       , visit https://dgraph.slack.com.
cluster0-node1_1  |
cluster0-node1_1  | Licensed under Apache 2.0 + Commons Clause. Copyright 2015-2018 Dgraph Labs, Inc.
cluster0-node1_1  |
cluster0-node1_1  |
cluster0-node2_1  |
cluster0-node2_1  | Dgraph version   : v1.0.8
cluster0-node2_1  | Commit SHA-1     : 1dd8376f
cluster0-node2_1  | Commit timestamp : 2018-08-31 10:47:07 -0700
cluster0-node2_1  | Branch           : HEAD
cluster0-node2_1  |
cluster0-node2_1  | For Dgraph official documentation, visit https://docs.dgraph.io.
cluster0-node2_1  | For discussions about Dgraph     , visit http://discuss.dgraph.io.
cluster0-node2_1  | To say hi to the community       , visit https://dgraph.slack.com.
cluster0-node2_1  |
cluster0-node2_1  | Licensed under Apache 2.0 + Commons Clause. Copyright 2015-2018 Dgraph Labs, Inc.
cluster0-node2_1  |
cluster0-node2_1  |
cluster0-node1_1  | 2018/09/13 17:43:43 server.go:118: Setting Badger option: ssd
cluster0-node1_1  | 2018/09/13 17:43:43 server.go:134: Setting Badger table load option: mmap
cluster0-node1_1  | 2018/09/13 17:43:43 server.go:147: Setting Badger value log load option: none
cluster0-node1_1  | 2018/09/13 17:43:43 server.go:158: Opening postings Badger DB with options: {Dir:p ValueDir:p SyncWrites:true TableLoadingMode:2 ValueLogLoadingMode:2 NumVersionsToKeep:2147483647 MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:32 NumMemtables:5 NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741824 ValueLogMaxEntries:1000000 NumCompactors:3 managedTxns:false DoNotCompact:false maxBatchCount:0 maxBatchSize:0 ReadOnly:false Truncate:true}
cluster0-node2_1  | 2018/09/13 17:43:43 server.go:118: Setting Badger option: ssd
cluster0-node2_1  | 2018/09/13 17:43:43 server.go:134: Setting Badger table load option: mmap
cluster0-node2_1  | 2018/09/13 17:43:43 server.go:147: Setting Badger value log load option: none
cluster0-node2_1  | 2018/09/13 17:43:43 server.go:158: Opening postings Badger DB with options: {Dir:p ValueDir:p SyncWrites:true TableLoadingMode:2 ValueLogLoadingMode:2 NumVersionsToKeep:2147483647 MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:32 NumMemtables:5 NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741824 ValueLogMaxEntries:1000000 NumCompactors:3 managedTxns:false DoNotCompact:false maxBatchCount:0 maxBatchSize:0 ReadOnly:false Truncate:true}
cluster0-node1_1  | 2018/09/13 17:43:43 gRPC server started.  Listening on port 9081
cluster0-node1_1  | 2018/09/13 17:43:43 HTTP server started.  Listening on port 8081
cluster0-node1_1  | 2018/09/13 17:43:43 groups.go:80: Current Raft Id: 0
cluster0-node1_1  | 2018/09/13 17:43:43 worker.go:86: Worker listening at address: [::]:7081
zero_1            | 2018/09/13 17:43:43 zero.go:365: Got connection request: addr:"cluster0-node1:7081"
cluster0-node1_1  | 2018/09/13 17:43:43 pool.go:108: == CONNECTED ==> Setting zero:5080
zero_1            | 2018/09/13 17:43:43 pool.go:108: == CONNECTED ==> Setting cluster0-node1:7081
zero_1            | 2018/09/13 17:43:43 zero.go:474: Connected: id:2 group_id:1 addr:"cluster0-node1:7081"
cluster0-node1_1  | 2018/09/13 17:43:43 groups.go:107: Connected to group zero. Assigned group: 1
cluster0-node2_1  | 2018/09/13 17:43:43 groups.go:80: Current Raft Id: 0
cluster0-node2_1  | 2018/09/13 17:43:43 gRPC server started.  Listening on port 9082
cluster0-node2_1  | 2018/09/13 17:43:43 HTTP server started.  Listening on port 8082
cluster0-node2_1  | 2018/09/13 17:43:43 worker.go:86: Worker listening at address: [::]:7082
cluster0-node0_1  | 2018/09/13 17:43:43 pool.go:108: == CONNECTED ==> Setting cluster0-node1:7081
cluster0-node1_1  | 2018/09/13 17:43:43 pool.go:108: == CONNECTED ==> Setting cluster0-node0:7080
zero_1            | 2018/09/13 17:43:43 zero.go:365: Got connection request: addr:"cluster0-node2:7082"
zero_1            | 2018/09/13 17:43:43 pool.go:108: == CONNECTED ==> Setting cluster0-node2:7082
cluster0-node2_1  | 2018/09/13 17:43:43 pool.go:108: == CONNECTED ==> Setting zero:5080
cluster0-node2_1  | 2018/09/13 17:43:43 groups.go:107: Connected to group zero. Assigned group: 1
cluster0-node1_1  | E0913 17:43:43.638362     646 storage.go:266] While seekEntry. Error: Unable to find raft entry
cluster0-node1_1  | 2018/09/13 17:43:43 draft.go:76: Node ID: 2 with GroupID: 1
cluster0-node1_1  | 2018/09/13 17:43:43 draft.go:923: Calling IsPeer
cluster0-node0_1  | 2018/09/13 17:43:43 pool.go:108: == CONNECTED ==> Setting cluster0-node2:7082
cluster0-node1_1  | 2018/09/13 17:43:43 draft.go:928: Done with IsPeer call
cluster0-node1_1  | 2018/09/13 17:43:43 draft.go:997: Trying to join peers.
cluster0-node1_1  | 2018/09/13 17:43:43 draft.go:906: Calling JoinCluster via leader: cluster0-node0:7080
cluster0-node1_1  | 2018/09/13 17:43:43 draft.go:910: Done with JoinCluster call
cluster0-node1_1  | 2018/09/13 17:43:43 raft.go:567: INFO: 2 became follower at term 0
zero_1            | 2018/09/13 17:43:43 zero.go:474: Connected: id:3 group_id:1 addr:"cluster0-node2:7082"
cluster0-node2_1  | 2018/09/13 17:43:43 pool.go:108: == CONNECTED ==> Setting cluster0-node1:7081
cluster0-node2_1  | 2018/09/13 17:43:43 pool.go:108: == CONNECTED ==> Setting cluster0-node0:7080
cluster0-node1_1  | 2018/09/13 17:43:43 raft.go:315: INFO: newRaft 2 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
cluster0-node1_1  | 2018/09/13 17:43:43 raft.go:567: INFO: 2 became follower at term 1
cluster0-node2_1  | E0913 17:43:43.642357     648 storage.go:266] While seekEntry. Error: Unable to find raft entry
cluster0-node1_1  | 2018/09/13 17:43:43 groups.go:507: Got address of a Zero server: zero:5080
cluster0-node0_1  | 2018/09/13 17:43:43 raft_server.go:166: [2] Done joining cluster with err: <nil>
cluster0-node1_1  | 2018/09/13 17:43:43 pool.go:108: == CONNECTED ==> Setting cluster0-node2:7082
cluster0-node1_1  | 2018/09/13 17:43:43 raft.go:708: INFO: 2 [term: 1] received a MsgApp message with higher term from 1 [term: 2]
cluster0-node1_1  | 2018/09/13 17:43:43 raft.go:567: INFO: 2 became follower at term 2
cluster0-node1_1  | 2018/09/13 17:43:43 node.go:301: INFO: raft.node: 2 elected leader 1 at term 2
cluster0-node1_1  | 2018/09/13 17:43:43 mutation.go:141: Done schema update predicate:"_predicate_" value_type:STRING list:true
cluster0-node2_1  | 2018/09/13 17:43:43 draft.go:76: Node ID: 3 with GroupID: 1
cluster0-node2_1  | 2018/09/13 17:43:43 draft.go:923: Calling IsPeer
cluster0-node2_1  | 2018/09/13 17:43:43 draft.go:928: Done with IsPeer call
cluster0-node2_1  | 2018/09/13 17:43:43 draft.go:997: Trying to join peers.
cluster0-node2_1  | 2018/09/13 17:43:43 draft.go:906: Calling JoinCluster via leader: cluster0-node0:7080
cluster0-node2_1  | 2018/09/13 17:43:43 draft.go:910: Done with JoinCluster call
cluster0-node2_1  | 2018/09/13 17:43:43 raft.go:567: INFO: 3 became follower at term 0
cluster0-node2_1  | 2018/09/13 17:43:43 raft.go:315: INFO: newRaft 3 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
cluster0-node2_1  | 2018/09/13 17:43:43 raft.go:567: INFO: 3 became follower at term 1
cluster0-node2_1  | 2018/09/13 17:43:43 groups.go:507: Got address of a Zero server: zero:5080
cluster0-node0_1  | 2018/09/13 17:43:43 raft_server.go:166: [3] Done joining cluster with err: <nil>
cluster0-node2_1  | 2018/09/13 17:43:43 raft.go:708: INFO: 3 [term: 1] received a MsgHeartbeat message with higher term from 1 [term: 2]
cluster0-node2_1  | 2018/09/13 17:43:43 raft.go:567: INFO: 3 became follower at term 2
cluster0-node2_1  | 2018/09/13 17:43:43 node.go:301: INFO: raft.node: 3 elected leader 1 at term 2
cluster0-node2_1  | 2018/09/13 17:43:43 mutation.go:141: Done schema update predicate:"_predicate_" value_type:STRING list:true

The logs for the 2nd and 3rd node have not changed, that is their entire output.

Maybe I have been setting this up wrong the whole time? I am not sure but I believe I have been following the docs and this used to work (sort of) on previous versions. I’ve been using a single node server for awhile because of these bugs which defeats a huge part of me wanting to use Dgraph.

I’ve asked this before but I used to copy the p directory over to all 3 nodes so that they started on parity but I was told that I did not need to do that. Is that now required again and only new writes are synced?

emhagman · September 13, 2018, 8:16pm

It appears that copying the p directory to all replica servers works and the data APPEARS to be copied over when doing writes to all other nodes, although the logs no longer display the copy (I believe they used to). I want to confirm this is the correct way to run this setup.

I was under the impression that adding more replica servers would automatically cause them to sync with the other nodes.

MichelDiz · September 13, 2018, 8:26pm

What is the issue exactly? in your report / logs there is nothing abnormal. Even tho, Dgraph Zero does the sync work gradually. You just have to wait. How are you evaluating the differences?

emhagman · September 13, 2018, 8:31pm

How long am I supposed to wait? Before, there used to be logs displaying the sync and now it appears there are no logs at all. Were they removed or set to debug?

The issue is the replica nodes have no data in them, even after waiting for an extended period of time. Even if they were receiving data, it appears the logs no longer print that they are receiving updates.

Lastly, the “lastUpdated” never seems to change on the /state route.

I guess my main complaint is that nowhere in the docs (which I’ve read the entirey of multiple times) does it explain how the replication process works or how long it takes and the one route that I can use /state to check on the progress does not appear to update the timestamp on the non-leader nodes (in the current v1.0.8)

The differences are easy to evaluate because making the same query on the 3 nodes returns empty results for the two non-leader nodes.

I was told by someone at Dgraph before that I did not have to copy the p directory to a replica node and that it should copy over. I guess I am just frustrated that I don’t see any sort of progress of that in the logs and no update in the /state call, nor do I see anywhere in the docs that estimates the length of that process or sets expectations.

dmai · September 13, 2018, 8:39pm

Replication shouldn’t take long. You’d even see progress in the logs when syncs happen. I noticed your Docker Compose file also has nginx and drafted/sql-to-graph containers too. Are you seeing the same replication issue with just the Zero and Server containers?

emhagman · September 13, 2018, 8:40pm

Those are irrelevant since that is how I interact with them from the outside world but not internally. I am seeing that same without those in there. Regardless of the nginx setup, I would still at least see the initial sync, which I am not.

I agree they should show in the logs which is why I am concerned.

EDIT: sql-to-graph makes several queries and merges them into one large RDF used in the dgraph bulk call to create my initial p directory used for the leader/zero. I can paste the output of the bulk insert if you’d like but it is also very boring in that everything works. Again, these issues only appear when trying to use replication with –replicas 3 on the zero. One server has always worked for me.

emhagman · September 13, 2018, 8:41pm

The reason I am using nginx, as a side note, is because setting up TLS with NodeJS has been a known problem with dgraph-nodejs so I am using TLS client auth through nginx to remedy that issue.

I really like dgraph and have used work arounds for many issues I have found. Unfortunately, replication is not one of those things I can seem to find a fix for so I would be very appreciative if we could figure this out

dmai · September 13, 2018, 8:50pm

“lastUpdated” is an entry in /state that you can typically ignore. It’s only populated if the instance was ever a leader.

That’s correct; the p directory does not need to be copied over. Replication will happen for new Server instances when they join an existing group (including a fresh cluster).

You could copy over the p directories initially for something like bulk loader, where you could potentially be loading hundreds of gigabytes of data into a cluster. You’d copy over the bulk-loaded p directory to the replicas of the group so you do not need to wait for all the data to be replicated by Dgraph at startup time.

Replication time depends on the data size and the network speed between the servers, so that depends on the setup.

Replication process can be clearer in the docs that’s good feedback.

Agreed. Let’s figure it out.

Is there a way I can reproduce your set up locally on my side? I can’t run your docker compose file as-is.

emhagman · September 13, 2018, 8:55pm

I have copied the p directory over to all 3 before startup and it appears that the writes are being copied over, although I still do not see anything in the logs that says it is syncing them.

I would still think I would see the logs for small writes showing the nodes are updating each other, although maybe this is not the case? The data is on all 3 servers so that is promising.

Those are private docker images. Essentially, load some very large RDF into a 3 replica cluster. I can try to reproduce this with the example data of movies you have.

Just a heads up, the size of our RDF file is over 2GB and the movie set uncompressed is only 66MB, I’d imagine that might have something to do with my results.

emhagman · September 13, 2018, 8:59pm

I guess what would be really useful to me is to know if I should always see something in the logs about the nodes syncing or if that only occurs in certain cases? Do I need to set my logs to verbose?

EDIT: Upon further investigation I can confirm that subsequent writes are being distributed across the nodes but that I do not see any logs that would suggest this is happening. This is only for the case where I copy the p directory to all nodes before start.

dmai · September 13, 2018, 9:26pm

The standard log level is enough.

Here’s an example of what you should see:

On leader:

2018/09/13 14:24:16 pool.go:108: == CONNECTED ==> Setting localhost:7081
2018/09/13 14:24:16 raft_server.go:166: [2] Done joining cluster with err: <nil>
2018/09/13 14:24:16 predicate.go:184: Got StreamSnapshot request. Mine: context:<id:1 group:1 addr:"localhost:7080" > index:1097 read_ts:1711 . Requested: context:<id:1 group:1 addr:"localhost:7080" > index:1097 read_ts:1711
2018/09/13 14:24:17 stream_lists.go:213: Sent batch of size: 4.0 MB in 15.765061ms.
2018/09/13 14:24:17 stream_lists.go:213: Sent batch of size: 4.0 MB in 78.842221ms.
2018/09/13 14:24:17 stream_lists.go:231: Sending SNAPSHOT Time elapsed: 01s, bytes sent: 8.0 MB, speed: 8.0 MB/sec
2018/09/13 14:24:17 stream_lists.go:213: Sent batch of size: 4.0 MB in 7.766569ms.
2018/09/13 14:24:17 stream_lists.go:213: Sent batch of size: 4.0 MB in 48.298735ms.
2018/09/13 14:24:17 stream_lists.go:213: Sent batch of size: 4.0 MB in 13.763863ms.
2018/09/13 14:24:18 stream_lists.go:213: Sent batch of size: 4.6 MB in 22.127659ms.
2018/09/13 14:24:18 stream_lists.go:213: Sent batch of size: 4.1 MB in 86.894781ms.
2018/09/13 14:24:18 stream_lists.go:231: Sending SNAPSHOT Time elapsed: 02s, bytes sent: 29 MB, speed: 14 MB/sec
2018/09/13 14:24:18 stream_lists.go:213: Sent batch of size: 4.7 MB in 35.32221ms.
2018/09/13 14:24:18 stream_lists.go:213: Sent batch of size: 4.7 MB in 12.38566ms.
2018/09/13 14:24:18 stream_lists.go:213: Sent batch of size: 9.3 MB in 89.945949ms.
2018/09/13 14:24:18 stream_lists.go:213: Sent batch of size: 5.2 MB in 69.260057ms.
2018/09/13 14:24:18 stream_lists.go:213: Sent batch of size: 4.7 MB in 33.206188ms.
2018/09/13 14:24:18 stream_lists.go:213: Sent batch of size: 4.6 MB in 12.701539ms.
2018/09/13 14:24:19 stream_lists.go:213: Sent batch of size: 4.7 MB in 200.427472ms.
2018/09/13 14:24:19 stream_lists.go:213: Sent batch of size: 4.6 MB in 17.057206ms.
2018/09/13 14:24:19 stream_lists.go:213: Sent batch of size: 4.7 MB in 20.819083ms.
2018/09/13 14:24:19 stream_lists.go:231: Sending SNAPSHOT Time elapsed: 03s, bytes sent: 76 MB, speed: 25 MB/sec
2018/09/13 14:24:19 stream_lists.go:213: Sent batch of size: 5.5 MB in 255.53445ms.
2018/09/13 14:24:20 stream_lists.go:213: Sent batch of size: 5.6 MB in 15.445729ms.
2018/09/13 14:24:20 stream_lists.go:213: Sent batch of size: 5.3 MB in 17.072558ms.
2018/09/13 14:24:20 stream_lists.go:213: Sent batch of size: 5.6 MB in 18.901842ms.
2018/09/13 14:24:20 stream_lists.go:213: Sent batch of size: 4.7 MB in 15.036282ms.
2018/09/13 14:24:20 stream_lists.go:246: Sending SNAPSHOT Sent 963908 (+1 maybe for schema) keys

On new server instance that joined the group:

2018/09/13 14:24:16 draft.go:61: Node ID: 2 with GroupID: 1
2018/09/13 14:24:16 draft.go:908: Calling IsPeer
2018/09/13 14:24:16 draft.go:913: Done with IsPeer call
2018/09/13 14:24:16 draft.go:982: Trying to join peers.
2018/09/13 14:24:16 draft.go:891: Calling JoinCluster via leader: localhost:7080
2018/09/13 14:24:16 draft.go:895: Done with JoinCluster call
2018/09/13 14:24:16 raft.go:567: INFO: 2 became follower at term 0
2018/09/13 14:24:16 raft.go:315: INFO: newRaft 2 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
2018/09/13 14:24:16 raft.go:567: INFO: 2 became follower at term 1
2018/09/13 14:24:16 groups.go:507: Got address of a Zero server: localhost:5080
2018/09/13 14:24:16 raft.go:708: INFO: 2 [term: 1] received a MsgHeartbeat message with higher term from 1 [term: 2]
2018/09/13 14:24:16 raft.go:567: INFO: 2 became follower at term 2
2018/09/13 14:24:16 node.go:301: INFO: raft.node: 2 elected leader 1 at term 2
2018/09/13 14:24:16 raft.go:1130: INFO: 2 [commit: 0, lastindex: 0, lastterm: 0] starts to restore snapshot [index: 1097, term: 2]
2018/09/13 14:24:16 log.go:288: INFO: log [committed=0, applied=0, unstable.offset=1, len(unstable.Entries)=0] starts to restore snapshot [index: 1097, term: 2]
2018/09/13 14:24:16 raft.go:1141: INFO: 2 restored progress of 1 [next = 1098, match = 0, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0]
2018/09/13 14:24:16 raft.go:1107: INFO: 2 [commit: 1097] restored snapshot [index: 1097, term: 2]
2018/09/13 14:24:16 draft.go:645: -------> SNAPSHOT [1] from 1
2018/09/13 14:24:18 predicate.go:49: Getting SNAPSHOT: Time elapsed: 01s, bytes written: 7.5 MB, 7.5 MB/s
2018/09/13 14:24:19 predicate.go:49: Getting SNAPSHOT: Time elapsed: 02s, bytes written: 19 MB, 9.5 MB/s
2018/09/13 14:24:20 predicate.go:49: Getting SNAPSHOT: Time elapsed: 03s, bytes written: 34 MB, 11 MB/s
2018/09/13 14:24:20 predicate.go:128: EOF has been reached
2018/09/13 14:24:21 predicate.go:49: Getting SNAPSHOT: Time elapsed: 04s, bytes written: 65 MB, 16 MB/s
2018/09/13 14:24:22 predicate.go:49: Getting SNAPSHOT: Time elapsed: 05s, bytes written: 88 MB, 18 MB/s
2018/09/13 14:24:23 predicate.go:49: Getting SNAPSHOT: Time elapsed: 06s, bytes written: 98 MB, 16 MB/s
2018/09/13 14:24:23 predicate.go:157: Got 963908 keys. DONE.
2018/09/13 14:24:23 draft.go:648: -------> SNAPSHOT [1]. DONE.

There are log lines that say “Sending SNAPSHOT” and “Getting SNAPSHOT”.

dmai · September 13, 2018, 9:27pm

Note that here’s what I did to get the above output:

Start 1 Zero, 1 Server, replication setting of 3.
Load data into the cluster.
Start a second Server.

I did not copy any p directories.

emhagman · September 13, 2018, 9:31pm

I still feel this may be a logging issue. I do not see any of that but can confirm the data is being copied over with new writes. My logs remain:

cluster0-node1_1  | 2018/09/13 21:30:21 server.go:118: Setting Badger option: ssd
cluster0-node1_1  | 2018/09/13 21:30:21 server.go:134: Setting Badger table load option: mmap
cluster0-node1_1  | 2018/09/13 21:30:21 server.go:147: Setting Badger value log load option: none
cluster0-node1_1  | 2018/09/13 21:30:21 server.go:158: Opening postings Badger DB with options: {Dir:p ValueDir:p SyncWrites:true TableLoadingMode:2 ValueLogLoadingMode:2 NumVersionsToKeep:2147483647 MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:32 NumMemtables:5 NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741824 ValueLogMaxEntries:1000000 NumCompactors:3 managedTxns:false DoNotCompact:false maxBatchCount:0 maxBatchSize:0 ReadOnly:false Truncate:true}
cluster0-node1_1  | 2018/09/13 21:30:22 groups.go:80: Current Raft Id: 2
cluster0-node1_1  | 2018/09/13 21:30:22 gRPC server started.  Listening on port 9081
cluster0-node1_1  | 2018/09/13 21:30:22 HTTP server started.  Listening on port 8081
cluster0-node1_1  | 2018/09/13 21:30:22 worker.go:86: Worker listening at address: [::]:7081
cluster0-node1_1  | 2018/09/13 21:30:22 pool.go:108: == CONNECTED ==> Setting zero:5080
cluster0-node1_1  | 2018/09/13 21:30:22 groups.go:107: Connected to group zero. Assigned group: 0
cluster0-node1_1  | 2018/09/13 21:30:22 pool.go:108: == CONNECTED ==> Setting cluster0-node2:7082
cluster0-node1_1  | 2018/09/13 21:30:22 pool.go:108: == CONNECTED ==> Setting cluster0-node0:7080
cluster0-node1_1  | 2018/09/13 21:30:22 draft.go:76: Node ID: 2 with GroupID: 1
cluster0-node1_1  | 2018/09/13 21:30:22 draft.go:963: Restarting node for group: 1
cluster0-node1_1  | 2018/09/13 21:30:22 raft.go:567: INFO: 2 became follower at term 2
cluster0-node1_1  | 2018/09/13 21:30:22 raft.go:315: INFO: newRaft 2 [peers: [1,2,3], term: 2, commit: 1025, applied: 1024, lastindex: 1025, lastterm: 2]
cluster0-node1_1  | 2018/09/13 21:30:23 node.go:301: INFO: raft.node: 2 elected leader 1 at term 2
cluster0-node1_1  | 2018/09/13 21:30:42 groups.go:507: Got address of a Zero server: zero:5080

In my case if the data is quite large and it takes awhile to sync and I see no syncing logs, that would make sense as to why I was running into this issue when I was not copying the p directory to each node.

EDIT: I am trying to play with --logtostderr and --stderrthreshold and saw at least a line about a SNAPSHOT but still nothing like your logs.

emhagman · September 13, 2018, 9:53pm

Do you have example logs of what I should see when a node receives a write and must distribute it across the other nodes?

dmai · September 13, 2018, 10:55pm

A single mutation? You can see what Dgraph is doing in the /debug/events page for the Servers in the Dgraph page for ApplyCh. It looks something like this:

|15:14:17.532218| .     5|... Applied proposal with key: 01-9184496022090891039, index: 50619. Err: <nil>|
|15:14:17.536679| .  4460|... Applying mutations for key: 01-5790244780102191173|
|15:14:17.536697| .    18|... Applied proposal with key: 01-5790244780102191173, index: 50620. Err: <nil>|
|15:14:17.537763| .  1066|... Applying Oracle Delta for key: 01-7841892354464413890|
|15:14:17.538055| .   291|... Applied proposal with key: 01-7841892354464413890, index: 50621. Err: <nil>|
|15:14:17.538291| .   236|... Applying Oracle Delta for key: 01-492042912766638662|

The monotonically increasing index value (50619, 50620, etc.) will tell you how caught up a follower is to a leader when comparing the index values across the Server instances.

Also, unless you’re starting from bulk loader output, you should never copy over p directories across different Server instances. It doesn’t do anything or make data transfers any more efficient. All the changes need to go through Raft. Followers aren’t aware of any existing contents of the p directory besides what the leader proposes and sends.

I believe you’re seeing a log entry like this:

I0913 15:13:55.270577   23103 draft.go:409] Creating snapshot at index: 40721. ReadTs: 27143.

That’s a log entry for Raft snapshots that Servers do periodically when the Raft logs are large enough to truncate the log size. The Raft snapshots are also proposed to the followers.

emhagman · September 14, 2018, 2:07am

I am only talking about copying the directory with regards to bulk loading. I was just making sure that I could indeed do that for an entire new setup of a Dgraph cluster. We can rebuild our RDF file from data in our psql database so when we upgrade versions I do a clean setup instead of exporting it from dgraph and then importing it back in.

We’ve run into issues before with corrupt indices or servers in certain Dgraph builds so I always want to have a way to recreate our RDF and be able to start fresh.

Thanks for the /debug/events tip, must’ve missed that one. I’ll check it out!

If copying the p directory to all nodes for a new cluster fixes my issues (which it appears it does), I’m happy for now! As far as I can tell the single mutations are being copied over, it is just the initial sync that never succeeds.

Topic		Replies	Views
1 Zero, 3 Nodes - Replication Factor of 3 Dgraph	4	731	August 1, 2018
Issue with Docker swarm Users	3	613	June 24, 2018
Issue with Swarm and second replica failing Users	4	1483	July 19, 2018
Docker Swarm setup Users	15	1118	May 22, 2018
Better upgrade guide Dgraph	6	1726	January 22, 2020

V1.0.8 issues with replication on 3 node cluster

Related topics