How to: Auto Backup of Dgraph on Kubernetes


(Arthur Wiebe) #1

This is a how-to on setting up an automated backup/export system for a multi-node Dgraph deployment on kubernetes.

The setup looks like this:

  • Google Cloud Storage bucket is used to store the backup files.
  • The GCS bucket is mounted to the dgraph server.
  • A K8s cronjob sends the HTTP request to a dgraph server to trigger the exports.

Custom Docker image

We need to include gcsfuse in the dgraph server docker image. So create a new Dockerfile, build the image and upload it to a registry available to your cluster.

Dockerfile

FROM dgraph/dgraph:v1.0.9

ENV GCSFUSE_REPO gcsfuse-bionic

RUN apt-get update && apt-get install -y --no-install-recommends \
    ca-certificates \
    curl \
    gnupg \
    && echo "deb http://packages.cloud.google.com/apt $GCSFUSE_REPO main" \
    | tee /etc/apt/sources.list.d/gcsfuse.list \
  && curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - \
  && apt-get update \
  && apt-get install -y gcsfuse \
  && apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* 

Create Google service account and secret

In your Google Cloud project create a service account that has access to the storage bucket, and save it as a JSON file.
Create the secret in your cluster.

kubectl create secret generic dgraph-storage-creds --from-file=credentials.json

Modify Dgraph server statefulset

The following modifications will be needed to your dgraph server statefulset workload.

Under volumes: add

      - name: googleservice
        secret:
          defaultMode: 420
          secretName: dgraph-storage-creds

Under volumeMounts: add

        - mountPath: /etc/google
          name: googleservice
          readOnly: true

Add

       env:
        - name: GOOGLE_APPLICATION_CREDENTIALS
          value: /etc/google/credentials.json

Add

        lifecycle:
          postStart:
            exec:
              command:
              - gcsfuse
              - -o
              - nonempty
              - dgraph_exports
              - /dgraph/export
          preStop:
            exec:
              command:
              - fusermount
              - -u
              - /dgraph/export

Remember to change the image to your custom docker image
And add the following argument to the dgraph server command, change the IP range to your cluster range.
--whitelist 10.4.0.1:10.4.255.254

Create cronjob

Create a new cronjob based on the following yaml
backup_cron.yaml

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: dgraph-backup
spec:
  schedule: "15 02 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: dgraph-backup
            image: artooro/curl
            args:
            - /bin/sh
            - -c
            - curl http://dgraph-server.default.svc.cluster.local:8080/admin/export
          restartPolicy: OnFailure

Deploy it via
kubectl create -f backup_cron.yaml

Wrap up

Test, verify the export files show up in the storage bucket, and you’re set.

This post assumes the name of your dgraph server service name is dgraph-server.

This is working for us, and I thought I’d share in case it’s a help to others.