Dgraph Bulk Loader - New schema and data weren't present initially

What I want to do

I want to load an initial set of data into a single-node cluster using the bulk loader tool, following the directions on
https://dgraph.io/docs/master/deploy/fast-data-loading/bulk-loader/.

What I did

I created four data files in .json format and the associated schema file. I copied the data and schema files to the Dgraph server and ran the command:

dgraph bulk -f file1.json,file2.json,file3.json,file4.json -s my.schema --map_shards=1 --reduce_shards=1 --http localhost:8000 --zero=localhost:5080

Everything appeared to run fine and the .out file was created. I ran the tree command and saw the ./out/0/p structure created.

I then copied the contents of the generated “p” folder to the “p” directory that Dgraph is running out of.

However, when I tried looking for the schema or any data in Ratel, I didn’t see anything. There was also no new activity in the alpha log.

There are only two “p” directories under the mount point (the original and the one created by the bulk loader) so I know Dgraph was running out of the right location.

I ended up killing the alpha process and then restarting. Initially, I got the message “Cannot acquire directory lock on “p”. Another process is using this Badger database. error: resource temporarily unavailable”. That alpha process disappeared after a minute or two.

At that point, I restarted alpha again, and then everything appeared to be fine - the new schema and data appear to be there.

Should I have stopped alpha after running the dgraph bulk command, but before copying the new “p” files in?

Dgraph metadata

dgraph version

Dgraph version : v21.03.1
Dgraph codename : rocket-1
Dgraph SHA-256 : a00b73d583a720aa787171e43b4cb4dbbf75b38e522f66c9943ab2f0263007fe
Commit SHA-1 : ea1cb5f35
Commit timestamp : 2021-06-17 20:38:11 +0530
Branch : HEAD
Go version : go1.16.2
jemalloc enabled : true

The dgraph bulk command pre-formats the p directories, it is not meant to be run while the alpha is up at all. Just the Zero servers and bulk loader, then copy the result of the bulk loader into correct locations, then start the alpha servers with that pre-formatted p directory. Anything else probably put you in a very odd disconnected state between the alpha servers and zeros.

From the docs you listed:

Only one or more Dgraph Zeros should be running for bulk loading. Dgraph Alphas will be started later.

1 Like

Thanks, that clears things up for me!