Bulk load to initial multi host cluster

Currently bulk loader runs only on a single machine. In the soon-to-be-released Dgraph v1.1 we optimized both the live loader and bulk loader—in our own tests we’ve seen bulk loader peak to 4 million edges/sec.

A multi-Zero setup does not make a difference for bulk loading. Bulk loader (must) connect to the Zero leader to assign UIDs to the nodes in the cluster. Adding more Zero instances doesn’t make the loading process any faster. Most of the work is done by bulk loader itself, not by Zero.

Bulk loader is highly concurrent, so more CPU cores would definitely help speed up the bulk loading process.

1 Like