Bulk load to initial multi host cluster

dmai · June 12, 2019, 6:10am

Currently bulk loader runs only on a single machine. In the soon-to-be-released Dgraph v1.1 we optimized both the live loader and bulk loader—in our own tests we’ve seen bulk loader peak to 4 million edges/sec.

A multi-Zero setup does not make a difference for bulk loading. Bulk loader (must) connect to the Zero leader to assign UIDs to the nodes in the cluster. Adding more Zero instances doesn’t make the loading process any faster. Most of the work is done by bulk loader itself, not by Zero.

Bulk loader is highly concurrent, so more CPU cores would definitely help speed up the bulk loading process.

Topic		Replies	Views
How to import bulk data into cluster？ Users	8	1452	September 27, 2019
Bulk Loader - Deploy Documentation	0	787	December 16, 2020
Dgraph Enhancement Proposal: bulk + live loader? Dgraph	2	555	August 9, 2019
Is it possible to run dgraph bulk several times concurrently? Dgraph	2	307	January 6, 2020
Bulk uploader not making equal shards Dgraph dgraph	9	756	March 16, 2022

Bulk load to initial multi host cluster

Related Topics