Dgraph Zero not moving tablets by itself

Hi team, as mentioned in the docs, it is stated that Dgraph Zero will itself move the tablets for shard rebalancing. This is not happening in my case. I have a cluster setup in the following way -

Machine A - one zero, three alphas
Machine B - one zero (peer option given for machine A), three alphas
Both machines are hosted on AWS and are t2.xlarge. Dgraph is running on these machines on docker. I am trying to insert 1 million nodes into dgraph, with random edges between nodes lying between 10-500 edges, meaning one node can be linked to a random number (between 10-500) of nodes.

An api call to /state endpoint gives the following info -

{
    "counter": "34938",
    "groups": {
        "1": {
            "members": {
                "1": {
                    "id": "1",
                    "groupId": 1,
                    "addr": "ip1:7081",
                    "leader": true,
                    "lastUpdate": "1595936458"
                },
                "2": {
                    "id": "2",
                    "groupId": 1,
                    "addr": "ip1:7080"
                },
                "3": {
                    "id": "3",
                    "groupId": 1,
                    "addr": "ip1:7082"
                }
            },
            "tablets": {
                "predicate_1": {
                    "groupId": 1,
                    "predicate": "predicate_1",
                    "moveTs": "265129"
                },
                "dgraph.graphql.schema": {
                    "groupId": 1,
                    "predicate": "dgraph.graphql.schema"
                },
                "dgraph.type": {
                    "groupId": 1,
                    "predicate": "dgraph.type"
                },
                "predicate_2": {
                    "groupId": 1,
                    "predicate": "predicate_2",
                    "moveTs": "265293"
                },
                "predicate_3": {
                    "groupId": 1,
                    "predicate": "predicate_3",
                    "moveTs": "265237"
                },
                "predicate_4": {
                    "groupId": 1,
                    "predicate": "predicate_4",
                    "moveTs": "254686"
                }
            },
            "snapshotTs": "254221",
            "checksum": "10116038444952767405"
        },
        "2": {
            "members": {
                "4": {
                    "id": "4",
                    "groupId": 2,
                    "addr": "ip2:7081",
                    "leader": true,
                    "lastUpdate": "1595936468"
                },
                "5": {
                    "id": "5",
                    "groupId": 2,
                    "addr": "ip2:7082"
                },
                "6": {
                    "id": "6",
                    "groupId": 2,
                    "addr": "ip2:7080"
                }
            },
            "tablets": {
                "dgraph.graphql.xid": {
                    "groupId": 2,
                    "predicate": "dgraph.graphql.xid"
                }
            },
            "snapshotTs": "258914",
            "checksum": "4375140596800650063"
        }
    },
    "zeros": {
        "1": {
            "id": "1",
            "addr": "ip1:5080",
            "leader": true
        },
        "2": {
            "id": "2",
            "addr": "ip2:5080"
        }
    },
    "maxLeaseId": "20000",
    "maxTxnTs": "270000",
    "maxRaftId": "6",
    "cid": "b5b81e42-7ef2-4e09-9c43-fd9277c9633a",
    "license": {
        "maxNodes": "18446744073709551615",
        "expiryTs": "1598528460",
        "enabled": true
    }
}

All my queries and mutations at this point are going to the machine A. Before moving the tablet, my calls were going to machine B, as can be seen in the below graphs (for machine A and B respectively) -

However, the case should be that zero itself move the tablets. There are too many zero logs like this -

W0728 13:18:47.850923      14 raft.go:733] Raft.Ready took too long to process: Timer Total: 218ms. Breakdown: [{sync 218ms} {disk 0s} {proposals 0s} {advance 0s}]. Num entries: 1. MustSync: true
W0728 13:20:31.411941      14 raft.go:733] Raft.Ready took too long to process: Timer Total: 247ms. Breakdown: [{sync 247ms} {disk 0s} {proposals 0s} {advance 0s}]. Num entries: 1. MustSync: true
W0728 13:22:29.252420      14 raft.go:733] Raft.Ready took too long to process: Timer Total: 317ms. Breakdown: [{sync 317ms} {disk 0s} {proposals 0s} {advance 0s}]. Num entries: 1. MustSync: true
W0728 13:24:09.553324      14 raft.go:733] Raft.Ready took too long to process: Timer Total: 398ms. Breakdown: [{sync 398ms} {disk 0s} {proposals 0s} {advance 0s}]. Num entries: 1. MustSync: true

and similar is the story of alpha

W0728 13:10:39.897482      13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 329ms. Breakdown: [{sync 329ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:10:45.337501      13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 271ms. Breakdown: [{sync 271ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:11:11.097817      13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 316ms. Breakdown: [{sync 316ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:11:42.358102      13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 328ms. Breakdown: [{sync 328ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:11:58.028158      13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 264ms. Breakdown: [{sync 264ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:12:12.688198      13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 246ms. Breakdown: [{sync 246ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:12:44.768439      13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 415ms. Breakdown: [{sync 415ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:13:26.318782      13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 238ms. Breakdown: [{sync 238ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:13:46.899080      13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 226ms. Breakdown: [{sync 226ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:14:02.409172      13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 349ms. Breakdown: [{sync 349ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:14:17.959260      13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 286ms. Breakdown: [{sync 286ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:14:43.909274      13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 250ms. Breakdown: [{sync 250ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
I0728 13:14:58.419555      13 draft.go:557] Creating snapshot at index: 100727. ReadTs: 268736.
W0728 13:15:20.129679      13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 286ms. Breakdown: [{sync 286ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:16:07.070095      13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 334ms. Breakdown: [{sync 333ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:16:22.540189      13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 236ms. Breakdown: [{sync 236ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:16:53.580157      13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 268ms. Breakdown: [{sync 268ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:17:16.110459      13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 242ms. Breakdown: [{sync 242ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:17:16.410786      13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 299ms. Breakdown: [{sync 298ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:17:24.530427      13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 258ms. Breakdown: [{sync 258ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:17:30.020465      13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 228ms. Breakdown: [{sync 228ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:18:06.440726      13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 284ms. Breakdown: [{sync 284ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:18:27.080776      13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 317ms. Breakdown: [{sync 316ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:18:58.221093      13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 238ms. Breakdown: [{sync 238ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:19:29.231488      13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 505ms. Breakdown: [{sync 504ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:20:00.161559      13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 258ms. Breakdown: [{sync 258ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:20:10.781700      13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 334ms. Breakdown: [{sync 334ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:21:02.661958      13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 268ms. Breakdown: [{sync 268ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:21:28.622141      13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 359ms. Breakdown: [{sync 359ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:21:34.002171      13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 358ms. Breakdown: [{sync 358ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:22:04.782435      13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 205ms. Breakdown: [{sync 205ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:22:10.082541      13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 228ms. Breakdown: [{sync 228ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:22:16.262446      13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 301ms. Breakdown: [{sync 300ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:22:35.812548      13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 238ms. Breakdown: [{sync 238ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:22:46.442927      13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 215ms. Breakdown: [{sync 215ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:23:07.042767      13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 255ms. Breakdown: [{sync 255ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:23:38.343012      13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 291ms. Breakdown: [{disk 151ms} {sync 140ms} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:24:46.023459      13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 258ms. Breakdown: [{sync 258ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:25:11.753710      13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 261ms. Breakdown: [{disk 151ms} {sync 110ms} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true
W0728 13:25:27.223829      13 draft.go:1183] Raft.Ready took too long to process: Timer Total: 229ms. Breakdown: [{sync 229ms} {disk 0s} {proposals 0s} {advance 0s}] Num entries: 1. MustSync: true

I am aware that these logs come when the machines used for dgraph has lower provisioned iops. Would bigger machine solve this? I have replicated this issue 2-3 times.

There is an interval that the rebalancing works. Have you changed it? Is your test taking this into account?

dgraph zero -h | grep inter
--rebalance_interval duration   Interval for trying a predicate move. (default 8m0s)

@MichelDiz No, I didn’t change the default rebalancing interval. This is my config.yaml file for zero on machine A -

my: ip1:5080
replicas: 3
idx: 1
v: 2
expose_trace: true

And on machine B

my: ip2:5080
idx: 2
peer: ip1:5080
replicas: 3
v: 2
expose_trace: true

I didn’t tweak with any other default value as well.