Hi, I’m having troubles with the Bulk Loader. In particular during the REDUCE phase. Simply can’t get it to perform.
The machine averages about 10% CPU load and 10 MB/s disk IO. That’s a machine with 48 processors and NVMe SSD:s in a raid configuration. The machine is not the limit. Just can’t get the Bulk Loader to make use of it. What could be holding it back?
Have tried many different combinations of --map_shards
, --reduce_shards
, --reducers
& --badger.compression
. Essentially those settings makes no (speed) difference. Is there some other magic switch?
The current tests I’m working on I have 100 rdf.gz
files little more than 800MB each. (This represents about 5% of the total dataset.)
The MAP phase takes about 2h but seems to scale well with the number of available cores and the --num_go_routines
setting. A bigger machine should help here.
The REDUCE phase takes 6-7h and I’m not able to do anything about this.
/Anders
Dgraph version : v20.11.1
Dgraph codename : tchalla-1
Dgraph SHA-256 : cefdcc880c0607a92a1d8d3ba0beb015459ebe216e79fdad613eb0d00d09f134
Commit SHA-1 : 7153d13fe
Commit timestamp : 2021-01-28 15:59:35 +0530
Branch : HEAD
Go version : go1.15.5
jemalloc enabled : true