Report a Dgraph Bug
What version of Dgraph are you using?
$ dgraph version
1.0.17
Have you tried reproducing the issue with the latest release?
No
What is the hardware spec (RAM, OS)?
16 core Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz running Ubuntu 18.04.5 LTS with 33GB ram
Steps to reproduce the issue (command/config used to run Dgraph).
dgraph alpha --my=alpha_3:7082 --lru_mb=12048 --zero=zero_1:5080 -o 2
Expected behaviour and actual result.
Alpha should be able to execute queries and mutations
What you wanted to do
We had the same setup for a very long time, and all of a sudden our servers stopped executing queries and mutations.
We have no clue what the root cause for this problem is just yet…
Our Grafana Dashboard:
**Cluster Setup **
3 servers running a pair of alpha and zero on swarm
Alpha 1 logs
For Dgraph official documentation, visit https://docs.dgraph.io.
For discussions about Dgraph , visit http://discuss.dgraph.io.
To say hi to the community , visit https://dgraph.slack.com.
Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2018 Dgraph Labs, Inc.
I0120 15:26:36.934976 1 run.go:461] x.Config: {DebugMode:false PortOffset:0 QueryEdgeLimit:1000000 NormalizeNodeLimit:10000}
I0120 15:26:36.934996 1 run.go:462] worker.Config: {ExportPath:export NumPendingProposals:256 Tracing:1 MyAddr:alpha_1:7080 ZeroAddr:zero_1:5080 RaftId:0 ExpandEdge:true WhiteListedIPRanges:[] MaxRetries:-1 StrictMutations:false SnapshotAfter:10000 AbortOlderThan:5m0s}
I0120 15:26:36.935014 1 run.go:463] edgraph.Config: {PostingDir:p BadgerTables:mmap BadgerVlog:mmap WALDir:w MutationsMode:0 AuthToken: AllottedMemory:12048}
I0120 15:26:36.935042 1 server.go:115] Setting Badger table load option: mmap
I0120 15:26:36.935048 1 server.go:127] Setting Badger value log load option: mmap
I0120 15:26:36.935052 1 server.go:155] Opening write-ahead log BadgerDB with options: {Dir:w ValueDir:w SyncWrites:false TableLoadingMode:1 ValueLogLoadingMode:2 NumVersionsToKeep:1 ReadOnly:false Truncate:true Logger:0x1f8b490 MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:65500 NumMemtables:5 NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741823 ValueLogMaxEntries:10000 NumCompactors:2 CompactL0OnClose:true LogRotatesToFlush:2 managedTxns:false maxBatchCount:0 maxBatchSize:0}
I0120 15:26:37.619444 1 node.go:88] All 8 tables opened in 677ms
I0120 15:26:37.620099 1 node.go:88] Replaying file id: 4884 at offset: 749917
I0120 15:26:37.620157 1 node.go:88] Replay took: 48.588µs
I0120 15:26:37.620189 1 node.go:88] Replaying file id: 4885 at offset: 0
I0120 15:26:37.633398 1 node.go:88] Replay took: 13.202292ms
I0120 15:26:37.633426 1 node.go:88] Replaying file id: 4886 at offset: 0
I0120 15:26:37.634874 1 node.go:88] Replay took: 1.442207ms
I0120 15:26:37.635026 1 server.go:115] Setting Badger table load option: mmap
I0120 15:26:37.635045 1 server.go:127] Setting Badger value log load option: mmap
I0120 15:26:37.635051 1 server.go:169] Opening postings BadgerDB with options: {Dir:p ValueDir:p SyncWrites:false TableLoadingMode:2 ValueLogLoadingMode:2 NumVersionsToKeep:2147483647 ReadOnly:false Truncate:true Logger:0x1f8b490 MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:1024 NumMemtables:5 NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741823 ValueLogMaxEntries:1000000 NumCompactors:2 CompactL0OnClose:true LogRotatesToFlush:2 managedTxns:false maxBatchCount:0 maxBatchSize:0}
I0120 15:26:38.783748 1 node.go:88] All 12 tables opened in 1.146s
I0120 15:26:38.792483 1 node.go:88] Replaying file id: 54 at offset: 413356010
I0120 15:26:39.759737 1 node.go:88] Replay took: 967.229684ms
I0120 15:26:39.759818 1 node.go:88] Replaying file id: 55 at offset: 0
I0120 15:26:40.028641 1 node.go:88] Replay took: 268.805722ms
I0120 15:26:40.028951 1 groups.go:96] Current Raft Id: 0x1
I0120 15:26:40.029057 1 worker.go:81] Worker listening at address: [::]:7080
I0120 15:26:40.029088 1 run.go:392] gRPC server started. Listening on port 9080
I0120 15:26:40.029103 1 run.go:393] HTTP server started. Listening on port 8080
2021/01/20 15:26:40 Endpoint has been deprecated. Please use CollectorEndpoint instead.
I0120 15:26:40.129163 1 pool.go:138] CONNECTED to zero_1:5080
W0120 15:26:40.191245 1 pool.go:226] Connection lost with zero_1:5080. Error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: lookup zero_1 on 127.0.0.11:53: no such host"
I0120 15:27:04.921881 1 pool.go:138] CONNECTED to zero_2:5081
I0120 15:27:05.415040 1 groups.go:119] Connected to group zero. Assigned group: 0
I0120 15:27:05.415071 1 groups.go:121] Raft Id after connection to Zero: 0x1
I0120 15:27:05.415099 1 pool.go:138] CONNECTED to alpha_2:7081
I0120 15:27:05.415110 1 pool.go:138] CONNECTED to alpha_3:7082
I0120 15:27:05.431551 1 pool.go:138] CONNECTED to zero_3:5082
I0120 15:27:05.431824 1 draft.go:74] Node ID: 0x1 with GroupID: 1
I0120 15:27:05.431949 1 node.go:157] Setting raft.Config to: &{ID:1 peers:[] learners:[] ElectionTick:20 HeartbeatTick:1 Storage:0xc019823320 Applied:10862439 MaxSizePerMsg:262144 MaxCommittedSizePerReady:67108864 MaxUncommittedEntriesSize:0 MaxInflightMsgs:256 CheckQuorum:false PreVote:true ReadOnlyOption:0 Logger:0x1f8b490 DisableProposalForwarding:false}
I0120 15:27:05.979889 1 node.go:306] Found Snapshot.Metadata: {ConfState:{Nodes:[1 2 3] Learners:[] XXX_unrecognized:[]} Index:10862439 Term:71 XXX_unrecognized:[]}
I0120 15:27:05.979937 1 node.go:317] Found hardstate: {Term:80 Vote:0 Commit:10870127 XXX_unrecognized:[]}
I0120 15:27:09.303274 1 node.go:326] Group 1 found 7689 entries
I0120 15:27:09.303307 1 draft.go:1321] Restarting node for group: 1
I0120 15:27:09.303359 1 node.go:193] Setting conf state to nodes:1 nodes:2 nodes:3
I0120 15:27:12.524082 1 node.go:88] 1 became follower at term 80
I0120 15:27:12.524460 1 node.go:88] newRaft 1 [peers: [1,2,3], term: 80, commit: 10870127, applied: 10862439, lastindex: 10870127, lastterm: 80]
I0120 15:27:12.524500 1 groups.go:139] Server is ready
I0120 15:27:12.524571 1 draft.go:720] Found Raft progress: 10870110
I0120 15:27:12.524579 1 groups.go:650] Got address of a Zero leader: zero_2:5081
I0120 15:27:12.524808 1 groups.go:663] Starting a new membership stream receive from zero_2:5081.
I0120 15:27:12.573386 1 index.go:71] Long term for exact index on predicate: [03ee20c4-d814-4c4f-a678-9475e3a21405]. Consider switching to hash for better performance.
I0120 15:27:12.590515 1 index.go:71] Long term for exact index on predicate: [03ee20c4-d814-4c4f-a678-9475e3a21405]. Consider switching to hash for better performance.
I0120 15:27:12.710234 1 groups.go:680] Received first state update from Zero: counter:1397183 groups:<key:1 value:<members:<key:1 value:<id:1 group_id:1 addr:"alpha_1:7080" last_update:1611155812 > > members:<key:2 value:<id:2 group_id:1 addr:"alpha_2:7081" last_update:1611156430 > > members:<key:3 value:<id:3 group_id:1 addr:"alpha_3:7082" last_update:1611156069 > > tablets:<key:"00022c08-0b6e-43b1-a55a-bf7811ba043f" value:<group_id:1 predicate:"00022c08-0b6e-43b1-a55a-bf7811ba043f" space:511 > > tablets:<key:"0003055d-1a3c-4aef-a1d9-442430da42f3" value:<group_id:1 predicate:"0003055d-1a3c-4aef-a1d9-442430da42f3" space:83 > >
bunch of schema
predicate:"wspcPersonDefaultUserGroup" space:21887 > > tablets:<key:"~10aa1897-822b-491c-bb71-8415f8d6f851" value:<group_id:1 predicate:"~10aa1897-822b-491c-bb71-8415f8d6f851" > > tablets:<key:"~6ebb1f51-6806-4605-8d1f-97d25dc2ba11" value:<group_id:1 predicate:"~6ebb1f51-6806-4605-8d1f-97d25dc2ba11" > > tablets:<key:"~74e72b2b-9df5-4af5-a295-8d7a3e1e8e87" value:<group_id:1 predicate:"~74e72b2b-9df5-4af5-a295-8d7a3e1e8e87" > > tablets:<key:"~944dac0d-3197-4e7e-9d49-42bfce283043" value:<group_id:1 predicate:"~944dac0d-3197-4e7e-9d49-42bfce283043" > > tablets:<key:"~a6e30999-c971-4c0c-b1ef-dd6ea0994a19" value:<group_id:1 predicate:"~a6e30999-c971-4c0c-b1ef-dd6ea0994a19" > > tablets:<key:"~c3699619-9cd4-4d33-b24a-0b8c7831ee6a" value:<group_id:1 predicate:"~c3699619-9cd4-4d33-b24a-0b8c7831ee6a" > > snapshot_ts:9141515 checksum:2463542097647359390 > > zeros:<key:1 value:<id:1 addr:"zero_1:5080" > > zeros:<key:2 value:<id:2 addr:"zero_2:5081" leader:true > > zeros:<key:3 value:<id:3 addr:"zero_3:5082" > > maxLeaseId:1570000 maxTxnTs:9200000 maxRaftId:3 cid:"0f521e8e-e045-46ef-8b13-ce9a9610dd37"
I0120 15:27:14.527721 1 node.go:88] 1 [logterm: 80, index: 10870127, vote: 0] cast MsgPreVote for 2 [logterm: 80, index: 10870127] at term 80
I0120 15:27:14.624895 1 node.go:88] 1 is starting a new election at term 80
I0120 15:27:14.624923 1 node.go:88] 1 became pre-candidate at term 80
I0120 15:27:14.624932 1 node.go:88] 1 received MsgPreVoteResp from 1 at term 80
I0120 15:27:14.625547 1 node.go:88] 1 [logterm: 80, index: 10870127] sent MsgPreVote request to 2 at term 80
I0120 15:27:14.626824 1 node.go:88] 1 [logterm: 80, index: 10870127] sent MsgPreVote request to 3 at term 80
I0120 15:27:15.063423 1 node.go:88] 1 [logterm: 80, index: 10870127, vote: 0] cast MsgPreVote for 3 [logterm: 80, index: 10870127] at term 80
I0120 15:27:15.065336 1 node.go:88] 1 [term: 80] received a MsgVote message with higher term from 2 [term: 81]
I0120 15:27:15.065765 1 node.go:88] 1 became follower at term 81
I0120 15:27:15.066899 1 node.go:88] 1 [logterm: 80, index: 10870127, vote: 0] cast MsgVote for 2 [logterm: 80, index: 10870127] at term 81
I0120 15:27:15.067859 1 node.go:88] 1 [logterm: 80, index: 10870127, vote: 2] rejected MsgVote from 3 [logterm: 80, index: 10870127] at term 81
I0120 15:27:15.529616 1 node.go:88] raft.node: 1 elected leader 2 at term 81
Zero 1 logs
I0120 15:26:58.815437 1 init.go:88]
Dgraph version : v1.0.17
Commit SHA-1 : 42deb299
Commit timestamp : 2019-08-30 12:46:31 -0700
Branch : HEAD
Go version : go1.12.7
For Dgraph official documentation, visit https://docs.dgraph.io.
For discussions about Dgraph , visit http://discuss.dgraph.io.
To say hi to the community , visit https://dgraph.slack.com.
Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2018 Dgraph Labs, Inc.
I0120 15:26:58.816191 1 run.go:98] Setting up grpc listener at: 0.0.0.0:5080
I0120 15:26:58.816344 1 run.go:98] Setting up http listener at: 0.0.0.0:6080
badger 2021/01/20 15:26:59 INFO: All 1 tables opened in 198ms
badger 2021/01/20 15:26:59 INFO: Replaying file id: 17 at offset: 58958052
badger 2021/01/20 15:26:59 INFO: Replay took: 11.258µs
2021/01/20 15:26:59 Endpoint has been deprecated. Please use CollectorEndpoint instead.
I0120 15:26:59.046902 1 node.go:157] Setting raft.Config to: &{ID:1 peers:[] learners:[] ElectionTick:20 HeartbeatTick:1 Storage:0xc0000c1bc0 Applied:1396059 MaxSizePerMsg:262144 MaxCommittedSizePerReady:67108864 MaxUncommittedEntriesSize:0 MaxInflightMsgs:256 CheckQuorum:false PreVote:true ReadOnlyOption:0 Logger:0x1f8b490 DisableProposalForwarding:false}
I0120 15:26:59.048375 1 node.go:306] Found Snapshot.Metadata: {ConfState:{Nodes:[1 2 3] Learners:[] XXX_unrecognized:[]} Index:1396059 Term:41 XXX_unrecognized:[]}
I0120 15:26:59.048414 1 node.go:317] Found hardstate: {Term:47 Vote:3 Commit:1397181 XXX_unrecognized:[]}
I0120 15:26:59.409223 1 node.go:326] Group 0 found 1123 entries
I0120 15:26:59.409253 1 raft.go:433] Restarting node for dgraphzero
I0120 15:26:59.410506 1 node.go:193] Setting conf state to nodes:1 nodes:2 nodes:3
I0120 15:26:59.430666 1 pool.go:138] CONNECTED to alpha_1:7080
I0120 15:26:59.430697 1 pool.go:138] CONNECTED to alpha_2:7081
I0120 15:26:59.430719 1 pool.go:138] CONNECTED to alpha_3:7082
I0120 15:26:59.430730 1 pool.go:138] CONNECTED to zero_2:5081
I0120 15:26:59.430752 1 pool.go:138] CONNECTED to zero_3:5082
I0120 15:26:59.776292 1 node.go:88] 1 became follower at term 47
I0120 15:26:59.776432 1 node.go:88] newRaft 1 [peers: [1,2,3], term: 47, commit: 1397181, applied: 1396059, lastindex: 1397181, lastterm: 47]
I0120 15:26:59.778021 1 run.go:282] Running Dgraph Zero...
I0120 15:26:59.791191 1 oracle.go:106] Purged below ts:9141515, len(o.commits):1, len(o.rowCommit):0
I0120 15:27:00.157307 1 zero.go:396] Got connection request: cluster_info_only:true
I0120 15:27:00.157632 1 node.go:88] 1 no leader at term 47; dropping index reading msg
W0120 15:27:02.157746 1 node.go:668] [0x1] Read index context timed out
I0120 15:27:02.158265 1 node.go:88] 1 no leader at term 47; dropping index reading msg
I0120 15:27:03.459602 1 node.go:88] 1 [term: 47] received a MsgHeartbeat message with higher term from 2 [term: 48]
I0120 15:27:03.459812 1 node.go:88] 1 became follower at term 48
I0120 15:27:03.459830 1 node.go:88] raft.node: 1 elected leader 2 at term 48
W0120 15:27:03.548916 1 raft.go:707] Raft.Ready took too long to process: Timer Total: 3.766s. Breakdown: [{proposals 3.758s} {sync 8ms} {disk 0s} {advance 0s}]. Num entries: 0. MustSync: true
W0120 15:27:04.158397 1 node.go:668] [0x1] Read index context timed out
I0120 15:27:04.689838 1 zero.go:414] Connected: cluster_info_only:true
I0120 15:27:04.690008 1 zero.go:396] Got connection request: cluster_info_only:true
I0120 15:27:04.778808 1 zero.go:414] Connected: cluster_info_only:true
I0120 15:27:04.778948 1 zero.go:396] Got connection request: cluster_info_only:true
I0120 15:27:04.870006 1 zero.go:414] Connected: cluster_info_only:true