How to restore data on cloud?

I’ve been trying to figure out of how to restore data on cloud but I can’t get my head around it. Ive managed to pull the backup files (schema, data in RDF, data in JSON). I’ve then checked the docs which state that I should be able to online restore the data. However the mutation

mutation{
  restore(input:{
    location: "/path/to/backup/directory",
    backupId: "id_of_backup_to_restore"'
  }){
    message
    code
  }
}

always results in an error

Query not supported

Can anyone help out here? Any tips appreciated. :raised_hands:

Hi Florian,
I’ve replied to you with options on the support ticket open for the issue.
Let’s continue our conversation there.

Thanks!

1 Like

For the sake of completion, here’s a guide which will probably be available in a similar form in the docs.

Restoring and Cloning Data with LiveLoader

Up to this point restoring and/or cloning data for a specific namespace on Dgraph Cloud, is only possible via Dgraph’s LiveLoader tool. It is part of the Dgraph CLI and thus requires downloading the Dgraph docker image and running it locally.

Since as of now, binary backups only work in combination with the restore endpoint and the fact that the restore endpoint is not yet working, you have to download your data either in RDF or JSON format. Obviosly this can result in large file sizes and thus slowing down the download and the restore process.

Prerequisites
  • Install Docker on your local machine
  • Download the Dgraph Docker image (optional, can also be fetched remotely). There are currently 2 versions latest, which is v22 which is discontinued and v21.03-slash which is the current version on Cloud and compatible with the new Dgraph version v23-beta. Both versions seem to work but I’d recommend using v21.03-slash since it fits the cloud version.
  • For Windows users: install Windows Subsystem Linux WSL

Step 1 - Request a Backup

At first we need to fetch a backup from the server. This can be done by running the following mutation again the /admin/slash endpoint of your Cloud server. In order to perform this mutation you also need an Admin Token for your namespace which you would like to fetch the backup from. Admin tokens can be gereated in your Dgraph Admin Panel under Settings --> API Keys.

Similar mutations do exist on the /admin endpoint but thay currently result in a Query not supported. error.

Make sure you set the DG-Auth: <your-admin-api-key> header when running the mutation! This example uses the RDF format.

Request taskId and exportId
mutation Export {
  export(format: "rdf", namespace: <your-namespace-id>) {
    response {
      code
      message
    }
    taskId
    exportId
  }
}

Copy the taskId and the exportId and run this query to retrieve the download links for you r backup files.

Request the Download Links to your Backup Files
query export {
  exportStatus(
    taskId: "<taskId-from-export-mutation>"
    exportId: "<exportId-from-export-mutation>"
  ) {
    kind
    lastUpdated
    signedUrls
    status
  }
}

The following set of files will be available for download (signed URLs to backup files are only valid for 48h!):

  • g01.gql_schema.gz → gzipped GraphQL schema file
  • g01.rdf.gz → gzipped data file in RDF format
  • g01.schema.gz → gzipped DQL schema file

Step 2 - Get the Slash API Key

In order to connect to your slash gRPC endoint, you need a Slash API Key for your namespace. You can get this by logging in into Ratel via the Dgraph Admin Interface under Ratel.

Log in with your groot user then go to the Extra Settings tab and copy the Slash API Key.

Step 3/A - Restore Data on Namespace

Before we can restore data on a namespace, we first have to manually check if the data dump does not contain predicates which are not defined in the DQL schema. Dead predicates happen if you had schema changes in the past, the “removed” predicate holds data but you did not manually remove the predicate eg. via Ratel. If you have predicates in your data which are not defined in your schema, the restore will fail.

The easiest way to remove corrupted data from your g01.rdf is by running some RegEx, comparing it to g01.schema.

Once this is done you can run the following command in the terminal (or Power Shell, Ubuntu Bash, etc when running Windows):

docker run -it --rm -v <path-to-backup-files>:/tmp/ dgraph/dgraph:v21.03-slash \ dgraph live --slash_grpc_endpoint <your-cluster-endpoint>:443 -f /tmp/g01.rdf.gz -s /tmp/g01.schema.gz -t <your-slash-api-key> --creds="user=groot;password=<your-groot-password>;namespace=<your-namespace-id>"

A few things to mention here:

  • do not drop all data before you restore the namespace! You will delete the groot user and thus have no access to the namespace anymore! In case you need to delete all data, delete the namespace, create a new one and clone the data.
  • make sure that you don’t use https://<your-cluster-endpoint>, this does not work! Use port 443 and the URL without https.
  • the above process ONLY restores the data, including all ACL settings but does NOT restore the GraphQL schema! To do that either copy the backuped GraphQL schema and paste it in the Schema section in the Cloud Interface or run the updateGQLSchema mutation.

Step 3/B - Cloning Data from One Namespace into Another

In some cases you will have to delete one your namespaces because restore dos not work as intended. When cloning data from one namespace to the other you will have to alter the RDF data file a bit further.

In addition to deleteing predicates which are not part of the schema, you need to remove all references to the groot user AND it’s gardians group. If you don’t do this, you will compromise your ACL because you will have two groot users!

Once your data file is cleaned up, continue with the docker command from Step 2/A.

4 Likes