Dgraph v21.03.0: Online Restore Fails

Report a Dgraph Bug

The online restore fails with an error message that restore cannot write backup and that stat fails on a file that exists on an s3 bucket.

What version of Dgraph are you using?

  • commit 32f1f5893 (release/v21.03)

Have you tried reproducing the issue with the latest release?

n/a

What is the hardware spec (RAM, OS)?

  • Ubuntu 20.04 from docker image

Steps to reproduce the issue (comm

  1. From an existing 6 node cluster (3xA, 3xZ) that has access to an s3 bucket.
    Example helm install backup -n backup --values backup-cluster.yaml ~/charts/charts/dgraph/:
    # backup-cluster.yaml
    image:
      repository: darknerd/dgraph
      tag: v21.03
    ratel:
      enabled: false
    alpha:
      acl:
        enabled: true
        file:
          hmac_secret_file: MTIzNDU2Nzg5MDEyMzQ1Njc4OTAxMjM0NTY3ODkwMTI=
      extraEnvs:
        - name: DGRAPH_ALPHA_SECURITY
          value: whitelist=10.0.0.0/8,172.0.0.0/8,192.168.0.0/16
        - name: DGRAPH_ALPHA_ACL
          value: secret-file=/dgraph/acl/hmac_secret_file
    backups:
      full:
        enabled: true
        schedule: "0 0 * * *"
      incremental:
        enabled: true
        schedule: "0 1-23 * * *"
      admin:
        user: groot
        password: password
      destination: s3://s3.us-east-2.amazonaws.com/dgraph-dev-backups/bugbuster/backups
      keys:
        s3:
          access: REDACTED
          secret: REDACTED
    
  2. Backup the existing cluster (3 tenants were previously created)
    Example backup mutation:
    mutation {
      backup(input: {
        destination: "s3://s3.us-east-2.amazonaws.com/dgraph-dev-backups/jira"
        forceFull: true
      }) {
        response {
          message
          code
        }
      }
    }
    
  3. Create a similar cluster used for the restore.
    Example: helm install restore -n restore --values restore-cluster.yaml ~/charts/charts/dgraph/:
    # restore-cluster.yaml
    image:
      repository: darknerd/dgraph
      tag: v21.03
    ratel:
      enabled: false
    alpha:
      acl:
        enabled: true
        file:
          hmac_secret_file: MTIzNDU2Nzg5MDEyMzQ1Njc4OTAxMjM0NTY3ODkwMTI=
      extraEnvs:
        - name: DGRAPH_ALPHA_SECURITY
          value: whitelist=10.0.0.0/8,172.0.0.0/8,192.168.0.0/16
        - name: DGRAPH_ALPHA_ACL
          value: secret-file=/dgraph/acl/hmac_secret_file
    
  4. Perform a restore:
    mutation {
      restore(input:{
        location:"s3://s3.us-east-2.amazonaws.com/dgraph-dev-backups/jira"
        accessKey: "REDACTED"
        secretKey: "REDACTED"
      }) {
        message
        code
      }
    }
    

Expected behaviour and actual result.

Expected

There are two expectations from this:

  • that a restore would be attempted, not a backup (see error message in actual results below)
  • given the file does exist (from actual results below), a restore procedure would be successful.

The file in question does actually exist:

$ aws s3 ls s3://dgraph-dev-backups/jira/dgraph.20210403.043719.140/
2021-04-02 21:37:20  490166521 r42587-g1.backup

Actual

From the logs, the restore operation causes this:

E0403 04:55:58.897630      19 draft.go:720] Applying proposal. Error: cannot write backup: cannot write backup: Stat failed "dgraph.20210403.043719.140/r42587-g1.backup": The specified key does not exist.. Proposal: {"<nil>" [] "<nil>" "" "<nil>" "<nil>" '$' '\x00' "group_id:1 restore_ts:17 location:\"s3://s3.us-east-2.amazonaws.com/dgraph-dev-backups/jira\" access_key:\"REDACTED\" secret_key:\"REDACTED\" " "<nil>" "<nil>"}.
I0403 04:55:58.897745      19 draft.go:124] Operation completed with id: opRestore
E0403 04:55:58.897790      19 online_restore_ee.go:111] Error while restoring cannot propose restore request: cannot write backup: cannot write backup: Stat failed "dgraph.20210403.043719.140/r42587-g1.backup": The specified key does not exist.

The listBackups works fine using the same credentials.

query {
  listBackups(input: {
    location: "s3://s3.us-east-2.amazonaws.com/dgraph-dev-backups/joaquin/backup"
    accessKey: "REDACTED"
    secretKey: "REDACTED"
  }) {
    backupId
    backupNum
    encrypted
    path
    since
    type
  }
}

This has been resolved by fix(restore): append the object path prefix while reading backup by NamanJain8 · Pull Request #7686 · dgraph-io/dgraph · GitHub