Bulk load to initial multi host cluster

explorer · June 10, 2019, 10:18am

Hi, I load data into a initial cluster(1 zero and 3 alpha in multi host native ) by dgraph bulk ,and then i can find the schema and data from ratel,but the zero node dosen’t rebalance ,all the data is only on the original load node, what should i do in this case?
zero parameters:
./dgraph zero --idx 2 --my:IP:PORT --replicas 3 --telemetry
bulk load parameters:
–reduce_shards 1 --map_shards 3

MichelDiz · June 10, 2019, 3:01pm

Dgraph has its own rules of balancing. It will do it as soon as it is needed. But check in http://localhost:6080/state which groups are serving the predicates.

explorer · June 11, 2019, 5:21am

Fine, Does it mean if i bulk load data into node with a large disk capacity ,and the dataset is not enough large to trigger the rebalancing rules , all the data will on the original load node no matter what i do?

MichelDiz · June 11, 2019, 1:59pm

Please, check Get started with Dgraph
In the part “Shard rebalancing”.

You can also try to push tablets to other groups

/moveTablet?tablet=name&group=2 This endpoint can be used to move a tablet to a group. Zero already does shard rebalancing every 8 mins, this endpoint can be used to force move a tablet.

explorer · June 12, 2019, 4:06am

Thanks for your reply.
Another thing,if i have a massive datasets (may be more than 1TB and about 1 billion edge) which need to bulk load into the cluster, Do I have to load them in one Node? Can I split the datasets to other zero node to bulk load at the same time?
I only find one blog about the bulk load details : Loading close to 1M edges/sec into Dgraph , and official documents deploy/#bulk-loader doesn’t mention the multi-zero instance case.
Could you provide more detail about interaction mechanism between zero instance when bulk load in multi-zero instances case

dmai · June 12, 2019, 6:10am

Currently bulk loader runs only on a single machine. In the soon-to-be-released Dgraph v1.1 we optimized both the live loader and bulk loader—in our own tests we’ve seen bulk loader peak to 4 million edges/sec.

A multi-Zero setup does not make a difference for bulk loading. Bulk loader (must) connect to the Zero leader to assign UIDs to the nodes in the cluster. Adding more Zero instances doesn’t make the loading process any faster. Most of the work is done by bulk loader itself, not by Zero.

Bulk loader is highly concurrent, so more CPU cores would definitely help speed up the bulk loading process.

explorer · June 12, 2019, 7:23am

Very useful to solve my problem thanks you !

system · July 12, 2019, 7:23am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Can a server host multi dgraph zero? Dgraph	3	282	August 14, 2021
How to import bulk data into cluster？ Users	8	1525	September 27, 2019
Bulk Load data into Replicated Kubernetes Cluster Users	4	900	August 17, 2018
Bulk Loader - Deploy Documentation	0	892	December 16, 2020
Unable to load bulk loaded data into Dgraph Users	4	699	March 21, 2019

Bulk load to initial multi host cluster

Related topics