For the sake of completion, here’s a guide which will probably be available in a similar form in the docs.
Restoring and Cloning Data with LiveLoader
Up to this point restoring and/or cloning data for a specific namespace on Dgraph Cloud, is only possible via Dgraph’s LiveLoader tool. It is part of the Dgraph CLI and thus requires downloading the Dgraph docker image and running it locally.
Since as of now, binary backups only work in combination with the restore
endpoint and the fact that the restore endpoint is not yet working, you have to download your data either in RDF or JSON format. Obviosly this can result in large file sizes and thus slowing down the download and the restore process.
Prerequisites
- Install Docker on your local machine
- Download the Dgraph Docker image (optional, can also be fetched remotely). There are currently 2 versions
latest
, which is v22
which is discontinued and v21.03-slash
which is the current version on Cloud and compatible with the new Dgraph version v23-beta
. Both versions seem to work but I’d recommend using v21.03-slash
since it fits the cloud version.
- For Windows users: install Windows Subsystem Linux WSL
Step 1 - Request a Backup
At first we need to fetch a backup from the server. This can be done by running the following mutation again the /admin/slash
endpoint of your Cloud server. In order to perform this mutation you also need an Admin Token for your namespace which you would like to fetch the backup from. Admin tokens can be gereated in your Dgraph Admin Panel under Settings --> API Keys
.
Similar mutations do exist on the /admin
endpoint but thay currently result in a Query not supported.
error.
Make sure you set the DG-Auth: <your-admin-api-key>
header when running the mutation! This example uses the RDF format.
Request taskId
and exportId
mutation Export {
export(format: "rdf", namespace: <your-namespace-id>) {
response {
code
message
}
taskId
exportId
}
}
Copy the taskId
and the exportId
and run this query to retrieve the download links for you r backup files.
Request the Download Links to your Backup Files
query export {
exportStatus(
taskId: "<taskId-from-export-mutation>"
exportId: "<exportId-from-export-mutation>"
) {
kind
lastUpdated
signedUrls
status
}
}
The following set of files will be available for download (signed URLs to backup files are only valid for 48h!):
g01.gql_schema.gz
→ gzipped GraphQL schema file
g01.rdf.gz
→ gzipped data file in RDF format
g01.schema.gz
→ gzipped DQL schema file
Step 2 - Get the Slash API Key
In order to connect to your slash gRPC
endoint, you need a Slash API Key for your namespace. You can get this by logging in into Ratel via the Dgraph Admin Interface under Ratel
.
Log in with your groot user then go to the Extra Settings
tab and copy the Slash API Key.
Step 3/A - Restore Data on Namespace
Before we can restore data on a namespace, we first have to manually check if the data dump does not contain predicates which are not defined in the DQL schema. Dead predicates happen if you had schema changes in the past, the “removed” predicate holds data but you did not manually remove the predicate eg. via Ratel. If you have predicates in your data which are not defined in your schema, the restore will fail.
The easiest way to remove corrupted data from your g01.rdf
is by running some RegEx, comparing it to g01.schema
.
Once this is done you can run the following command in the terminal (or Power Shell, Ubuntu Bash, etc when running Windows):
docker run -it --rm -v <path-to-backup-files>:/tmp/ dgraph/dgraph:v21.03-slash \ dgraph live --slash_grpc_endpoint <your-cluster-endpoint>:443 -f /tmp/g01.rdf.gz -s /tmp/g01.schema.gz -t <your-slash-api-key> --creds="user=groot;password=<your-groot-password>;namespace=<your-namespace-id>"
A few things to mention here:
- do not drop all data before you restore the namespace! You will delete the groot user and thus have no access to the namespace anymore! In case you need to delete all data, delete the namespace, create a new one and clone the data.
- make sure that you don’t use
https://<your-cluster-endpoint>
, this does not work! Use port 443 and the URL without https
.
- the above process ONLY restores the data, including all ACL settings but does NOT restore the GraphQL schema! To do that either copy the backuped GraphQL schema and paste it in the Schema section in the Cloud Interface or run the
updateGQLSchema
mutation.
Step 3/B - Cloning Data from One Namespace into Another
In some cases you will have to delete one your namespaces because restore dos not work as intended. When cloning data from one namespace to the other you will have to alter the RDF data file a bit further.
In addition to deleteing predicates which are not part of the schema, you need to remove all references to the groot user AND it’s gardians group. If you don’t do this, you will compromise your ACL because you will have two groot users!
Once your data file is cleaned up, continue with the docker command from Step 2/A.