Hi team, as mentioned in the docs, it is stated that Dgraph Zero will itself move the tablets for shard rebalancing. This is not happening in my case. I have a cluster setup in the following way -
Machine A - one zero, three alphas
Machine B - one zero (peer option given for machine A), three alphas
Both machines are hosted on AWS and are t2.xlarge. Dgraph is running on these machines on docker. I am trying to insert 1 million nodes into dgraph, with random edges between nodes lying between 10-500 edges, meaning one node can be linked to a random number (between 10-500) of nodes.
An api call to /state endpoint gives the following info -
{
"counter": "34938",
"groups": {
"1": {
"members": {
"1": {
"id": "1",
"groupId": 1,
"addr": "ip1:7081",
"leader": true,
"lastUpdate": "1595936458"
},
"2": {
"id": "2",
"groupId": 1,
"addr": "ip1:7080"
},
"3": {
"id": "3",
"groupId": 1,
"addr": "ip1:7082"
}
},
"tablets": {
"predicate_1": {
"groupId": 1,
"predicate": "predicate_1",
"moveTs": "265129"
},
"dgraph.graphql.schema": {
"groupId": 1,
"predicate": "dgraph.graphql.schema"
},
"dgraph.type": {
"groupId": 1,
"predicate": "dgraph.type"
},
"predicate_2": {
"groupId": 1,
"predicate": "predicate_2",
"moveTs": "265293"
},
"predicate_3": {
"groupId": 1,
"predicate": "predicate_3",
"moveTs": "265237"
},
"predicate_4": {
"groupId": 1,
"predicate": "predicate_4",
"moveTs": "254686"
}
},
"snapshotTs": "254221",
"checksum": "10116038444952767405"
},
"2": {
"members": {
"4": {
"id": "4",
"groupId": 2,
"addr": "ip2:7081",
"leader": true,
"lastUpdate": "1595936468"
},
"5": {
"id": "5",
"groupId": 2,
"addr": "ip2:7082"
},
"6": {
"id": "6",
"groupId": 2,
"addr": "ip2:7080"
}
},
"tablets": {
"dgraph.graphql.xid": {
"groupId": 2,
"predicate": "dgraph.graphql.xid"
}
},
"snapshotTs": "258914",
"checksum": "4375140596800650063"
}
},
"zeros": {
"1": {
"id": "1",
"addr": "ip1:5080",
"leader": true
},
"2": {
"id": "2",
"addr": "ip2:5080"
}
},
"maxLeaseId": "20000",
"maxTxnTs": "270000",
"maxRaftId": "6",
"cid": "b5b81e42-7ef2-4e09-9c43-fd9277c9633a",
"license": {
"maxNodes": "18446744073709551615",
"expiryTs": "1598528460",
"enabled": true
}
}
All my queries and mutations at this point are going to the machine A. Before moving the tablet, my calls were going to machine B, as can be seen in the below graphs (for machine A and B respectively) -
However, the case should be that zero itself move the tablets. There are too many zero logs like this -
W0728 13:18:47.850923 14 raft.go:733] Raft.Ready took too long to process: Timer Total: 218ms. Breakdown: [{sync 218ms} {disk 0s} {proposals 0s} {advance 0s}]. Num entries: 1. MustSync: true
W0728 13:20:31.411941 14 raft.go:733] Raft.Ready took too long to process: Timer Total: 247ms. Breakdown: [{sync 247ms} {disk 0s} {proposals 0s} {advance 0s}]. Num entries: 1. MustSync: true
W0728 13:22:29.252420 14 raft.go:733] Raft.Ready took too long to process: Timer Total: 317ms. Breakdown: [{sync 317ms} {disk 0s} {proposals 0s} {advance 0s}]. Num entries: 1. MustSync: true
W0728 13:24:09.553324 14 raft.go:733] Raft.Ready took too long to process: Timer Total: 398ms. Breakdown: [{sync 398ms} {disk 0s} {proposals 0s} {advance 0s}]. Num entries: 1. MustSync: true
and similar is the story of alpha
W0728 13:10:39.897482 13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 329ms. Breakdown: [{sync 329ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:10:45.337501 13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 271ms. Breakdown: [{sync 271ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:11:11.097817 13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 316ms. Breakdown: [{sync 316ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:11:42.358102 13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 328ms. Breakdown: [{sync 328ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:11:58.028158 13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 264ms. Breakdown: [{sync 264ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:12:12.688198 13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 246ms. Breakdown: [{sync 246ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:12:44.768439 13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 415ms. Breakdown: [{sync 415ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:13:26.318782 13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 238ms. Breakdown: [{sync 238ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:13:46.899080 13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 226ms. Breakdown: [{sync 226ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:14:02.409172 13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 349ms. Breakdown: [{sync 349ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:14:17.959260 13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 286ms. Breakdown: [{sync 286ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:14:43.909274 13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 250ms. Breakdown: [{sync 250ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
I0728 13:14:58.419555 13 draft.go:557] Creating snapshot at index: 100727. ReadTs: 268736.
W0728 13:15:20.129679 13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 286ms. Breakdown: [{sync 286ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:16:07.070095 13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 334ms. Breakdown: [{sync 333ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:16:22.540189 13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 236ms. Breakdown: [{sync 236ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:16:53.580157 13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 268ms. Breakdown: [{sync 268ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:17:16.110459 13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 242ms. Breakdown: [{sync 242ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:17:16.410786 13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 299ms. Breakdown: [{sync 298ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:17:24.530427 13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 258ms. Breakdown: [{sync 258ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:17:30.020465 13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 228ms. Breakdown: [{sync 228ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:18:06.440726 13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 284ms. Breakdown: [{sync 284ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:18:27.080776 13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 317ms. Breakdown: [{sync 316ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:18:58.221093 13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 238ms. Breakdown: [{sync 238ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:19:29.231488 13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 505ms. Breakdown: [{sync 504ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:20:00.161559 13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 258ms. Breakdown: [{sync 258ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:20:10.781700 13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 334ms. Breakdown: [{sync 334ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:21:02.661958 13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 268ms. Breakdown: [{sync 268ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:21:28.622141 13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 359ms. Breakdown: [{sync 359ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:21:34.002171 13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 358ms. Breakdown: [{sync 358ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:22:04.782435 13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 205ms. Breakdown: [{sync 205ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:22:10.082541 13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 228ms. Breakdown: [{sync 228ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:22:16.262446 13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 301ms. Breakdown: [{sync 300ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:22:35.812548 13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 238ms. Breakdown: [{sync 238ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:22:46.442927 13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 215ms. Breakdown: [{sync 215ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:23:07.042767 13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 255ms. Breakdown: [{sync 255ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:23:38.343012 13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 291ms. Breakdown: [{disk 151ms} {sync 140ms} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:24:46.023459 13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 258ms. Breakdown: [{sync 258ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:25:11.753710 13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 261ms. Breakdown: [{disk 151ms} {sync 110ms} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:25:27.223829 13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 229ms. Breakdown: [{sync 229ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
I am aware that these logs come when the machines used for dgraph has lower provisioned iops. Would bigger machine solve this? I have replicated this issue 2-3 times.