After bulk load, dgraph times out during rebalance

Report a Dgraph Bug

After bulk loading data into 3 shards on 3 servers, when we try to start dgraph, zero logs that it is trying to rebalance the Name predicate, which is on associated with every node in the graph. However, that move times out each time with a context deadline exceeded error. This appears to be hardcoded at 20 minutes from looking at the source in tablet.go:

predicateMoveTimeout = 20 * time.Minute

What version of Dgraph are you using?

v20.03.4

Have you tried reproducing the issue with the latest release?

master branch also has 20m timeout.

What is the hardware spec (RAM, OS)?

CentOS Linux release 7.8.2003

Steps to reproduce the issue (command/config used to run Dgraph).

Start dgraph cluster (3 alpha nodes, 1 zero) after bulk loading data and it cannot rebalance.

Expected behaviour and actual result.

Expected behavior is to rebalance successfully.

Actual behavior is timeout after 20m and alpha nodes are unable to rebalance predicates.

Logs:

I0731 12:38:43.646333       1 tablet.go:108] Going to move predicate: [Name], size: [43 GB] from group 1 to 2
I0731 12:38:43.646608       1 tablet.go:135] Starting move: predicate:"Name" source_gid:1 dest_gid:2 txn_ts:82001
E0731 12:58:43.645868       1 tablet.go:70] while calling MovePredicate: rpc error: code = DeadlineExceeded desc = context deadline exceeded