Can we run bulk uploader multiple times?

porsche · February 5, 2022, 11:09am

Question:

Can we run bulk uploader multiple times? We were getting below error when bulk uploader was launched more than once
```
Output directory exists and is not empty. Use --replace_out to overwrite it.
```
We launch bulk uploader as soon as our data pipeline finishes writing .rdf.gz data file. This process continues till no more files
Each file was about 250 MB in size

Below steps we follow…

Made sure at least one zero was running
Brought one alpha that was blocked with init container (thanks to helm chart)
Executed bulk uploader command from /dgraph folder on zero

We used a cronjob that wakes up every 1 minute to launch bulk loader command, if there are any files in ${files_in_ready_state} folder. Below is our command snippet

dgraph bulk -f ${files_in_ready_state} -s ${schemaFile} --format=rdf --xidmap xid --store_xids --out /coldstart/out --map_shards=3 --reduce_shards=3 --zero=dgraph-dgraph-zero:5080

gja · February 5, 2022, 5:54pm

Unless something changed recently, I believe this is a no. Bulk loader and backup/restore both need a live zero server, and outputs to a local p directory which you have to copy to the alphas.

So the alphas p directory has to get replaced. Live loader is more incremental.

porsche · February 5, 2022, 6:12pm

If this is true, isn’t it a huge limitation?

Does Bulk upload scale out?
What should be the process to ingest few trillions of predicates?
Submit all .rdf.gz files in one go? that is so silly? This calls of a serios spark connector!
I’m red multiple POSTs from this forum members
– Bulk uploader choke on memory since it reads all the files at the same time

Topic		Replies	Views
Distributed bulk loader Dgraph bulkloader , dgraph	0	548	February 10, 2022
Where is bulk uploader /out folder Dgraph dgraph , area:bulk-loader	3	618	February 6, 2022
How to copy out/0/p from zero to alpha (blocked with init) Dgraph dgraph , kind:bug , area:bulk-loader	1	591	February 5, 2022
Can we use live uploader followed by bulk uploader? Dgraph kind:question , area:bulk-loader , area:live-loader	0	529	February 18, 2022
Bulk Loader - Deploy Documentation	0	799	December 16, 2020

Can we run bulk uploader multiple times?

Related Topics