Piecewise loading using `dgraph bulk` loader

I Want to Do

Use Dgraph Bulk loader to:

  • Load data after the alphas have been started in the past
  • Use the bulk loader to generate the out directories multiple times for different datasets

What I Did

I have a Dgraph HA cluster running on GKE and have used the dgraph live loader to load some initial data. Will stopping the alphas, copying over some more *sst files to the p folder generated by the bulk loader work?

Also, when I have used the bulk loader to generate an initial dataset, but then want to use it to generate another load, is the best practice to just use another output folder? Is there any convention that needs to be followed naming-wise for the generated files?

Not an ideal thing to do.

Not sure what you mean. The same data? for what?

For that case, nope. You can’t do it. In theory, you could stop the Alphas, have a bulk and create an extra node(extra Alpha / or group) and try to sync up with the cluster. But that was never tested and it is just a theory.

No, just live load.

That’s the problem. The files are handled by Badger and have its manifest file. That’s a wrong approach. The idea of introducing a temporary group, would be the way out. But we never tested it.

The scenario for the multiple datasets and multiple bulk load runs was e.g. I have prepared a large dataset 1, which I want to bulk load and a large dataset 2 which I want to bulk load as well. If I understand correctly what you are saying, I’d have to wait for both datasets to be ready and bulk load them both at the same time, instead of being able to do this sequentially (e.g. generating the out folder for dataset 1 and then for dataset 2)?


1 Like