I have offline bulk-loaded a large dataset. Then I am spinning up a zero and a single alpha node serving the bulk-loaded data. Accessing the cluster with Ratel shows me the bulk-loaded schema and expected predicate sizes. Querying works fine.
Then, 5 minutes after the alpha node registered with the zero, the zero starts printing [0x1] Read index context timed out
every two seconds and everything stops working.
The dataset has ~150m uids, ~5b triples and a schema with thousands of predicates. Maybe this size of a schema makes cluster membership timeout.
zero log
[Sentry] 2021/05/28 17:50:03 Integration installed: ContextifyFrames
[Sentry] 2021/05/28 17:50:03 Integration installed: Environment
[Sentry] 2021/05/28 17:50:03 Integration installed: Modules
[Sentry] 2021/05/28 17:50:03 Integration installed: IgnoreErrors
[Sentry] 2021/05/28 17:50:03 Integration installed: ContextifyFrames
[Sentry] 2021/05/28 17:50:03 Integration installed: Environment
[Sentry] 2021/05/28 17:50:03 Integration installed: Modules
[Sentry] 2021/05/28 17:50:03 Integration installed: IgnoreErrors
I0528 17:50:03.987314 19 sentry_integration.go:48] This instance of Dgraph will send anonymous reports of panics back to Dgraph Labs via Sentry. No confidential information is sent. These reports help improve Dgraph. To opt-out, restart your instance with the --telemetry “sentry=false;” flag. For more info, see https://dgraph.io/docs/howto/#data-handling.
I0528 17:50:04.134584 19 init.go:110]
Dgraph version : v21.03.0
Dgraph codename : rocket
Dgraph SHA-256 : b4e4c77011e2938e9da197395dbce91d0c6ebb83d383b190f5b70201836a773f
Commit SHA-1 : a77bbe8ae
Commit timestamp : 2021-04-07 21:36:38 +0530
Branch : HEAD
Go version : go1.16.2
jemalloc enabled : true
For Dgraph official documentation, visit https://dgraph.io/docs.
For discussions about Dgraph , visit http://discuss.dgraph.io.
For fully-managed Dgraph Cloud , visit https://dgraph.io/cloud.
Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2021 Dgraph Labs, Inc.
I0528 17:50:04.134660 19 run.go:259] Setting Config to: {raft:0xc00013a250 telemetry:0xc00013a110 limit:0xc00013a260 bindall:true portOffset:0 numReplicas:17 peer: w:zw rebalanceInterval:480000000000 tlsClientConfig: audit: limiterConfig:0xc0003963f0}
I0528 17:50:04.134714 19 run.go:145] Setting up grpc listener at: 0.0.0.0:5080
I0528 17:50:04.134975 19 run.go:145] Setting up http listener at: 0.0.0.0:6080
I0528 17:50:04.135565 19 log.go:295] Found file: 1 First Index: 1
I0528 17:50:04.136041 19 storage.go:125] Init Raft Storage with snap: 0, first: 1, last: 1636
I0528 17:50:04.136167 19 node.go:152] Setting raft.Config to: &{ID:1 peers: learners: ElectionTick:20 HeartbeatTick:1 Storage:0xc0003cb380 Applied:0 MaxSizePerMsg:262144 MaxCommittedSizePerReady:67108864 MaxUncommittedEntriesSize:0 MaxInflightMsgs:256 CheckQuorum:false PreVote:true ReadOnlyOption:0 Logger:0x32c8038 DisableProposalForwarding:false}
I0528 17:50:04.137225 19 node.go:321] Found hardstate: {Term:2 Vote:1 Commit:1636 XXX_unrecognized:}
I0528 17:50:04.138340 19 node.go:326] Group 0 found 1636 entries
I0528 17:50:04.138360 19 raft.go:605] Restarting node for dgraphzero
I0528 17:50:04.138376 19 log.go:34] 1 became follower at term 2
I0528 17:50:04.138383 19 log.go:34] newRaft 1 [peers: , term: 2, commit: 1636, applied: 0, lastindex: 1636, lastterm: 2]
I0528 17:50:04.139103 19 run.go:393] Running Dgraph Zero…
I0528 17:50:04.139607 19 node.go:189] Setting conf state to nodes:1
I0528 17:50:04.139761 19 raft.go:966] Done applying conf change at 0x1
I0528 17:50:06.639612 19 log.go:34] 1 is starting a new election at term 2
I0528 17:50:06.639636 19 log.go:34] 1 became pre-candidate at term 2
I0528 17:50:06.639640 19 log.go:34] 1 received MsgPreVoteResp from 1 at term 2
I0528 17:50:06.639655 19 log.go:34] 1 became candidate at term 3
I0528 17:50:06.639658 19 log.go:34] 1 received MsgVoteResp from 1 at term 3
I0528 17:50:06.639666 19 log.go:34] 1 became leader at term 3
I0528 17:50:06.639671 19 log.go:34] raft.node: 1 elected leader 1 at term 3
I0528 17:50:06.639705 19 raft.go:912] I’ve become the leader, updating leases.
I0528 17:50:06.639715 19 assign.go:47] Updated UID: 179410001. Txn Ts: 10001. NsID: 1.
I0528 18:00:06.460148 19 zero.go:442] Got connection request: cluster_info_only:true
I0528 18:00:06.460304 19 zero.go:467] Connected: cluster_info_only:true
I0528 18:00:06.461182 19 zero.go:442] Got connection request: group_id:1 addr:“dgraph-alpha-0.dgraph-alpha.default.svc.cluster.local:7080” force_group_id:true
I0528 18:00:06.461380 19 pool.go:162] CONNECTING to dgraph-alpha-0.dgraph-alpha.default.svc.cluster.local:7080
I0528 18:00:06.461578 19 zero.go:607] Connected: id:1 group_id:1 addr:“dgraph-alpha-0.dgraph-alpha.default.svc.cluster.local:7080” force_group_id:true
W0528 18:00:06.463320 19 pool.go:267] Connection lost with dgraph-alpha-0.dgraph-alpha.default.svc.cluster.local:7080. Error: rpc error: code = Unknown desc = No node has been set up yet
I0528 18:01:04.195153 19 raft.go:780] Skipping creating a snapshot. Num groups: 1, Num checkpoints: 0
I0528 18:02:04.139355 19 raft.go:780] Skipping creating a snapshot. Num groups: 1, Num checkpoints: 0
I0528 18:03:04.139539 19 raft.go:780] Skipping creating a snapshot. Num groups: 1, Num checkpoints: 0
I0528 18:04:04.140072 19 raft.go:780] Skipping creating a snapshot. Num groups: 1, Num checkpoints: 0
I0528 18:05:04.139908 19 raft.go:780] Skipping creating a snapshot. Num groups: 1, Num checkpoints: 0
W0528 18:05:09.384800 19 node.go:680] [0x1] Read index context timed out
W0528 18:05:11.399774 19 node.go:680] [0x1] Read index context timed out
W0528 18:05:13.399851 19 node.go:680] [0x1] Read index context timed out
alpha log
…
I0528 18:00:53.934891 19 groups.go:491] Serving tablet for: 0-http://mzn.dbpedia.org/property/شوماره
I0528 18:00:53.941371 19 groups.go:491] Serving tablet for: 0-http://vo.dbpedia.org/property/vödsOtsinifik
I0528 18:00:53.942341 19 groups.go:166] Server is ready
I0528 18:00:53.942360 19 access_ee.go:408] ResetAcl closed
I0528 18:00:53.942365 19 access_ee.go:318] RefreshAcls closed
I0528 18:00:56.365271 19 admin.go:835] No GraphQL schema in Dgraph; serving empty GraphQL API
W0528 18:05:16.995273 19 draft.go:1472] While sending membership to Zero. Error: rpc error: code = DeadlineExceeded desc = context deadline exceeded
W0528 18:05:26.649850 19 groups.go:878] No membership update for 10s. Closing connection to Zero.
I0528 18:05:26.649881 19 groups.go:816] Got address of a Zero leader: dgraph-zero-0.dgraph-zero.default.svc.cluster.local:5080
I0528 18:05:26.650020 19 groups.go:830] Starting a new membership stream receive from dgraph-zero-0.dgraph-zero.default.svc.cluster.local:5080.
E0528 18:05:26.650674 19 groups.go:838] Unable to sync memberships. Error: rpc error: code = Canceled desc = context canceled. State:
W0528 18:05:36.650290 19 groups.go:878] No membership update for 10s. Closing connection to Zero.
I0528 18:05:36.650328 19 groups.go:816] Got address of a Zero leader: dgraph-zero-0.dgraph-zero.default.svc.cluster.local:5080