What I want to do
I want to load an initial set of data into a single-node cluster using the bulk loader tool, following the directions on
https://dgraph.io/docs/master/deploy/fast-data-loading/bulk-loader/.
What I did
I created four data files in .json format and the associated schema file. I copied the data and schema files to the Dgraph server and ran the command:
dgraph bulk -f file1.json,file2.json,file3.json,file4.json -s my.schema --map_shards=1 --reduce_shards=1 --http localhost:8000 --zero=localhost:5080
Everything appeared to run fine and the .out file was created. I ran the tree command and saw the ./out/0/p structure created.
I then copied the contents of the generated “p” folder to the “p” directory that Dgraph is running out of.
However, when I tried looking for the schema or any data in Ratel, I didn’t see anything. There was also no new activity in the alpha log.
There are only two “p” directories under the mount point (the original and the one created by the bulk loader) so I know Dgraph was running out of the right location.
I ended up killing the alpha process and then restarting. Initially, I got the message “Cannot acquire directory lock on “p”. Another process is using this Badger database. error: resource temporarily unavailable”. That alpha process disappeared after a minute or two.
At that point, I restarted alpha again, and then everything appeared to be fine - the new schema and data appear to be there.
Should I have stopped alpha after running the dgraph bulk command, but before copying the new “p” files in?
Dgraph metadata
dgraph version
Dgraph version : v21.03.1
Dgraph codename : rocket-1
Dgraph SHA-256 : a00b73d583a720aa787171e43b4cb4dbbf75b38e522f66c9943ab2f0263007fe
Commit SHA-1 : ea1cb5f35
Commit timestamp : 2021-06-17 20:38:11 +0530
Branch : HEAD
Go version : go1.16.2
jemalloc enabled : true