How to: Auto Backup of Dgraph on Kubernetes

This is a how-to on setting up an automated backup/export system for a multi-node Dgraph deployment on kubernetes.

The setup looks like this:

  • Google Cloud Storage bucket is used to store the backup files.
  • The GCS bucket is mounted to the dgraph server.
  • A K8s cronjob sends the HTTP request to a dgraph server to trigger the exports.

Custom Docker image

We need to include gcsfuse in the dgraph server docker image. So create a new Dockerfile, build the image and upload it to a registry available to your cluster.

Dockerfile

FROM dgraph/dgraph:v1.0.9

ENV GCSFUSE_REPO gcsfuse-bionic

RUN apt-get update && apt-get install -y --no-install-recommends \
    ca-certificates \
    curl \
    gnupg \
    && echo "deb http://packages.cloud.google.com/apt $GCSFUSE_REPO main" \
    | tee /etc/apt/sources.list.d/gcsfuse.list \
  && curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - \
  && apt-get update \
  && apt-get install -y gcsfuse \
  && apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* 

Create Google service account and secret

In your Google Cloud project create a service account that has access to the storage bucket, and save it as a JSON file.
Create the secret in your cluster.

kubectl create secret generic dgraph-storage-creds --from-file=credentials.json

Modify Dgraph server statefulset

The following modifications will be needed to your dgraph server statefulset workload.

Under volumes: add

      - name: googleservice
        secret:
          defaultMode: 420
          secretName: dgraph-storage-creds

Under volumeMounts: add

        - mountPath: /etc/google
          name: googleservice
          readOnly: true

Add

       env:
        - name: GOOGLE_APPLICATION_CREDENTIALS
          value: /etc/google/credentials.json

Add

        lifecycle:
          postStart:
            exec:
              command:
              - gcsfuse
              - -o
              - nonempty
              - dgraph_exports
              - /dgraph/export
          preStop:
            exec:
              command:
              - fusermount
              - -u
              - /dgraph/export

Remember to change the image to your custom docker image
And add the following argument to the dgraph server command, change the IP range to your cluster range.
--whitelist 10.4.0.1:10.4.255.254

Create cronjob

Create a new cronjob based on the following yaml
backup_cron.yaml

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: dgraph-backup
spec:
  schedule: "15 02 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: dgraph-backup
            image: artooro/curl
            args:
            - /bin/sh
            - -c
            - curl http://dgraph-server.default.svc.cluster.local:8080/admin/export
          restartPolicy: OnFailure

Deploy it via
kubectl create -f backup_cron.yaml

Wrap up

Test, verify the export files show up in the storage bucket, and you’re set.

This post assumes the name of your dgraph server service name is dgraph-server.

This is working for us, and I thought I’d share in case it’s a help to others.

2 Likes

useful.But does it work for v21.12.0?

I’m currently running Dgraph v21.03.2 so haven’t tested it with v21.12.
What I’d like to do is simplify this mechanism to use the new GCS CSI driver talked about here: Access Cloud Storage buckets with the Cloud Storage FUSE CSI driver  |  Google Kubernetes Engine (GKE)  |  Google Cloud

thanks.
Ask again . I’m not sure how to use backup to restore to any point in time when the dgraph cluster is completely unavailable. Like in MySQL, data can be restored to any point in time through xtrabackup + binlog. But in dgraph , there doesn’t seem to be that way?
please see this : How to backup dgraph
thank you

You would just have to keep multiple snapshots, or in this case exports. So whatever mechanism or script you end up using to automatically back it up, make sure it does a rotation or keep X number of files, etc.