Piecewise loading using `dgraph bulk` loader

Matthias_Baetens · February 23, 2021, 12:18am

I Want to Do

Use Dgraph Bulk loader to:

Load data after the alphas have been started in the past
Use the bulk loader to generate the out directories multiple times for different datasets

What I Did

I have a Dgraph HA cluster running on GKE and have used the dgraph live loader to load some initial data. Will stopping the alphas, copying over some more *sst files to the p folder generated by the bulk loader work?

Also, when I have used the bulk loader to generate an initial dataset, but then want to use it to generate another load, is the best practice to just use another output folder? Is there any convention that needs to be followed naming-wise for the generated files?

MichelDiz · February 23, 2021, 12:47am

Not an ideal thing to do.

Not sure what you mean. The same data? for what?

For that case, nope. You can’t do it. In theory, you could stop the Alphas, have a bulk and create an extra node(extra Alpha / or group) and try to sync up with the cluster. But that was never tested and it is just a theory.

No, just live load.

That’s the problem. The files are handled by Badger and have its manifest file. That’s a wrong approach. The idea of introducing a temporary group, would be the way out. But we never tested it.

Matthias_Baetens · February 23, 2021, 8:34am

The scenario for the multiple datasets and multiple bulk load runs was e.g. I have prepared a large dataset 1, which I want to bulk load and a large dataset 2 which I want to bulk load as well. If I understand correctly what you are saying, I’d have to wait for both datasets to be ready and bulk load them both at the same time, instead of being able to do this sequentially (e.g. generating the out folder for dataset 1 and then for dataset 2)?

MichelDiz · February 23, 2021, 1:08pm

Yes.

Topic		Replies	Views
Bulk Loader - Deploy Documentation	0	895	December 16, 2020
About bulk loader Users	7	1856	September 12, 2018
Dgraph Bulk load on version 20.07.02 Dgraph kind:question , bulkloader	3	958	November 9, 2020
Dgraph Bulk Loader - New schema and data weren't present initially Dgraph	2	485	August 19, 2021
Can dgraph use hdfs? Dgraph	16	707	June 1, 2020

Piecewise loading using `dgraph bulk` loader

I Want to Do

What I Did

Related topics