Bulk Load doesn't import / save the schema file

Trying to stand up a replica copy of my production data:

  • Using the docker standalone image
  • download the rdf and schema files from S3 bucket and gunzip
  • run bulk load command specifying the --files and --schema arguments
  • bulk load completes with no errors
  • open Ratel and see the default schema, not the schema specified as a parameter
  • I have to bulk edit the schema by copy/pasting the data in the schema file I supplied as a parameter
  • queries using the updated types now works

Question:
Should the bulk import set the schema? If not how can I “bulk edit” the schema from a cli command?

Thank you.

1 Like

Hey @amc,

Yeah, the bulk loader doesn’t alter the schema. IIRC I think it uses the supplied schema during the process.

The /admin endpoint is used to modify the schema. See https://dgraph.io/docs/graphql/admin/#using-adminschema

Odd, it should work. Check if the Schema is complete, also if there are no errors in the file. For example the file might be corrupted in some way and has been skipped.

But it is odd in the aspect that Dgraph should generate a new schema based on the given dataset. it gets the predicates and give them a “default” scalar type if the schema wasn’t provided.

Yes, maybe is the file. Sometimes with give the wrong path to the schema or some context path is wrong. I’m 100% sure that it should upload the schema if you gave one.

Which CLI?

It kind does. When you give an incomplete schema it will infer the dataset and create one.

That’s GraphQL only. Although you can provide GraphQL schema via Bulk. Usually nobody does. I’m pretty sure he is talking about DQL.

If he is using GraphQL he shoudl provide both schemas during the load.

1 Like

How can I validate the schema file? If I open it in a text editor and just select all and copy/pasta into bulk edit to goes pastes/updates with no problems. The file is generated from within DGraph to S3 functionality.

Not sure what kind of validation, but you can’t in DQL.

The issue relies somewhere in the steps to the load. Provide more context.

You had mentioned the file might be corrupted, so perhaps Dgraph had a way to validate a file or at least bail on the import if the file is corrupt. However considering I can just open the file in vim and copy/paste the data into Ratel I doubt the file is invalid.

  • RDF and schema are generated via the export to S3 bucket process from another DGraph system
  • Launching the standalone docker image (dgraph/standalone)
  • Attaching two volumes to the docker image /dgraph (for the Dgraph data) and /scripts which I have the downloaded files g01.rdf, g01.schema and g01.gql_schema
  • I then run docker exec command to execute the following command in the container: dgraph bulk --files /scripts --schema /scripts/g01.schema --http localhost:8000 --zero=localhost:5080
  • The process runs, and I can see output of the data being created and duration etc…
  • The process completes with no errors
  • I launch a Retal docker and point it to the standalone docker
  • I open up schema and it’s the default as if no data exists
  • I then use bulk edit and copy/paste exactly what is in the schema file
  • The schema updates, I can click on the elements and see some info and example data
  • I can then go and run queries against the data I just imported

You know, you can’t use Bulkloader with that image. Unless you kill the Alpha running there. Also, after the Bulk you need to point or cp the files created by the bulk and them start the Alpha. The only instance that you should keep running is the Zero. And also keep that zero, due uid leasing context.

Based in what you said, the issue is the whole process.

Cuz you didn’t move the output from the Bulkloader to the right path.

Conveniently my AWS spot instance terminated and I lost all my work and had to start again…

Sorry I left off an important step, I override the docker start command using the entrypoint option. I have a run.sh in an attached volume that looks like this (which doesn’t start the alpha):

#!/bin/bash

# fail if any error occurs
set -e

Warning: This standalone version is meant for quickstart purposes only.
         It is NOT RECOMMENDED for production environments.\033[0;0m"

# For Dgraph versions v20.11 and older
export DGRAPH_ALPHA_WHITELIST=0.0.0.0/0
# For Dgraph versions v21.03 and newer
export DGRAPH_ALPHA_SECURITY='whitelist=0.0.0.0/0'

# TODO properly handle SIGTERM for all three processes.
dgraph zero

I then docker exec this script:

#!/bin/bash
dgraph bulk -f /scripts -s /scripts/g01.schema --zero=localhost:5080

Once that is complete I remove my entrypoint option.

Ok, I wasn’t doing and to be honest I have no fkn’ clue how I got it work on my previous tries!

This time around, in the directory I use as the /dgraph volume in docker, I moved the out/0/p into the root of /dgraph (overriding what as there). Restarting the docker image it now works as expected.

@MichelDiz you’d make a phycologist; you didn’t give me the answer, but slowly led me to my own resolution. I can step outside and face the world again. :grin:

Don’t do this. do a rm -fr ./* before move. Only keep the zero files.

LOL - I heard that a few times in life.