Bulk Load doesn't import / save the schema file

amc · August 15, 2022, 8:47am

Trying to stand up a replica copy of my production data:

Using the docker standalone image
download the rdf and schema files from S3 bucket and gunzip
run bulk load command specifying the --files and --schema arguments
bulk load completes with no errors
open Ratel and see the default schema, not the schema specified as a parameter
I have to bulk edit the schema by copy/pasting the data in the schema file I supplied as a parameter
queries using the updated types now works

Question:
Should the bulk import set the schema? If not how can I “bulk edit” the schema from a cli command?

Thank you.

matthewmcneely · August 15, 2022, 2:49pm

Hey @amc,

Yeah, the bulk loader doesn’t alter the schema. IIRC I think it uses the supplied schema during the process.

The /admin endpoint is used to modify the schema. See https://dgraph.io/docs/graphql/admin/#using-adminschema

MichelDiz · August 15, 2022, 3:22pm

Odd, it should work. Check if the Schema is complete, also if there are no errors in the file. For example the file might be corrupted in some way and has been skipped.

But it is odd in the aspect that Dgraph should generate a new schema based on the given dataset. it gets the predicates and give them a “default” scalar type if the schema wasn’t provided.

Yes, maybe is the file. Sometimes with give the wrong path to the schema or some context path is wrong. I’m 100% sure that it should upload the schema if you gave one.

Which CLI?

It kind does. When you give an incomplete schema it will infer the dataset and create one.

That’s GraphQL only. Although you can provide GraphQL schema via Bulk. Usually nobody does. I’m pretty sure he is talking about DQL.

If he is using GraphQL he shoudl provide both schemas during the load.

amc · August 16, 2022, 6:28am

How can I validate the schema file? If I open it in a text editor and just select all and copy/pasta into bulk edit to goes pastes/updates with no problems. The file is generated from within DGraph to S3 functionality.

MichelDiz · August 16, 2022, 2:43pm

Not sure what kind of validation, but you can’t in DQL.

The issue relies somewhere in the steps to the load. Provide more context.

amc · August 17, 2022, 2:39am

You had mentioned the file might be corrupted, so perhaps Dgraph had a way to validate a file or at least bail on the import if the file is corrupt. However considering I can just open the file in vim and copy/paste the data into Ratel I doubt the file is invalid.

RDF and schema are generated via the export to S3 bucket process from another DGraph system
Launching the standalone docker image (dgraph/standalone)
Attaching two volumes to the docker image /dgraph (for the Dgraph data) and /scripts which I have the downloaded files g01.rdf, g01.schema and g01.gql_schema
I then run docker exec command to execute the following command in the container: dgraph bulk --files /scripts --schema /scripts/g01.schema --http localhost:8000 --zero=localhost:5080
The process runs, and I can see output of the data being created and duration etc…
The process completes with no errors
I launch a Retal docker and point it to the standalone docker
I open up schema and it’s the default as if no data exists
I then use bulk edit and copy/paste exactly what is in the schema file
The schema updates, I can click on the elements and see some info and example data
I can then go and run queries against the data I just imported

MichelDiz · August 17, 2022, 2:45am

You know, you can’t use Bulkloader with that image. Unless you kill the Alpha running there. Also, after the Bulk you need to point or cp the files created by the bulk and them start the Alpha. The only instance that you should keep running is the Zero. And also keep that zero, due uid leasing context.

Based in what you said, the issue is the whole process.

Cuz you didn’t move the output from the Bulkloader to the right path.

amc · August 17, 2022, 1:22pm

Conveniently my AWS spot instance terminated and I lost all my work and had to start again…

Sorry I left off an important step, I override the docker start command using the entrypoint option. I have a run.sh in an attached volume that looks like this (which doesn’t start the alpha):

#!/bin/bash

# fail if any error occurs
set -e

Warning: This standalone version is meant for quickstart purposes only.
         It is NOT RECOMMENDED for production environments.\033[0;0m"

# For Dgraph versions v20.11 and older
export DGRAPH_ALPHA_WHITELIST=0.0.0.0/0
# For Dgraph versions v21.03 and newer
export DGRAPH_ALPHA_SECURITY='whitelist=0.0.0.0/0'

# TODO properly handle SIGTERM for all three processes.
dgraph zero

I then docker exec this script:

#!/bin/bash
dgraph bulk -f /scripts -s /scripts/g01.schema --zero=localhost:5080

Once that is complete I remove my entrypoint option.

Ok, I wasn’t doing and to be honest I have no fkn’ clue how I got it work on my previous tries!

This time around, in the directory I use as the /dgraph volume in docker, I moved the out/0/p into the root of /dgraph (overriding what as there). Restarting the docker image it now works as expected.

@MichelDiz you’d make a phycologist; you didn’t give me the answer, but slowly led me to my own resolution. I can step outside and face the world again.

MichelDiz · August 17, 2022, 1:54pm

Don’t do this. do a rm -fr ./* before move. Only keep the zero files.

LOL - I heard that a few times in life.

Topic		Replies	Views
Creating Schema and loading data Dgraph kind:question	18	1620	July 23, 2021
Won't bulk load with graphql schema convert graphql schema to dql schema？ GraphQL kind:question , status:accepted , kind:bug , ticket:created	4	625	July 21, 2021
Bulk loader fails to create graphql schema properly Dgraph graphql	11	1224	October 13, 2020
Does bulk loader create indexes according to specified schema? Users	3	634	January 26, 2018
Dgraph Bulk Loader - New schema and data weren't present initially Dgraph	2	483	August 19, 2021

Bulk Load doesn't import / save the schema file

Related topics