Question:
- Can we run bulk uploader multiple times? We were getting below error when bulk uploader was launched more than once
Output directory exists and is not empty. Use --replace_out to overwrite it.
- We launch bulk uploader as soon as our data pipeline finishes writing .rdf.gz data file. This process continues till no more files
- Each file was about 250 MB in size
Below steps we follow…
- Made sure at least one zero was running
- Brought one alpha that was blocked with init container (thanks to helm chart)
- Executed bulk uploader command from /dgraph folder on zero
- We used a cronjob that wakes up every 1 minute to launch bulk loader command, if there are any files in ${files_in_ready_state} folder. Below is our command snippet
dgraph bulk -f ${files_in_ready_state} -s ${schemaFile} --format=rdf --xidmap xid --store_xids --out /coldstart/out --map_shards=3 --reduce_shards=3 --zero=dgraph-dgraph-zero:5080