I’m trying out the bulk loader.
During MAP it appears to be fast and scales well when increasing
REDUCE takes much longer than MAP, and during this phase cpu utilisation drops significantly. I’ve tried different values on
reducers but don’t see much difference. What’s the key thing to do to make the REDUCE phase faster?
reduce_shards is stated as one of the main things to tweak. This is not purely a performance parameter for the bulk loader as it must match the number of alpha instances used later, and then the options for
reducers are also limited by what
reduce_shards is set to. This limits “everything”.
Loading lots of data… The test machine has 24cores/48threads and about 400G RAM.
Running tests on 5% of the total data set takes about 10h. That’s too slow for us. Since I see the machine is far from fully loaded I hope there’s something to tweak. What?
Dgraph version : v20.11.0
Dgraph codename : tchalla
Dgraph SHA-256 : 8acb886b24556691d7d74929817a4ac7d9db76bb8b77de00f44650931a16b6ac
Commit SHA-1 : c4245ad55
Commit timestamp : 2020-12-16 15:55:40 +0530
Branch : HEAD
Go version : go1.15.5
jemalloc enabled : true