This is a how-to on setting up an automated backup/export system for a multi-node Dgraph deployment on kubernetes.
The setup looks like this:
- Google Cloud Storage bucket is used to store the backup files.
- The GCS bucket is mounted to the dgraph server.
- A K8s cronjob sends the HTTP request to a dgraph server to trigger the exports.
Custom Docker image
We need to include gcsfuse in the dgraph server docker image. So create a new Dockerfile, build the image and upload it to a registry available to your cluster.
Dockerfile
FROM dgraph/dgraph:v1.0.9
ENV GCSFUSE_REPO gcsfuse-bionic
RUN apt-get update && apt-get install -y --no-install-recommends \
ca-certificates \
curl \
gnupg \
&& echo "deb http://packages.cloud.google.com/apt $GCSFUSE_REPO main" \
| tee /etc/apt/sources.list.d/gcsfuse.list \
&& curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - \
&& apt-get update \
&& apt-get install -y gcsfuse \
&& apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
Create Google service account and secret
In your Google Cloud project create a service account that has access to the storage bucket, and save it as a JSON file.
Create the secret in your cluster.
kubectl create secret generic dgraph-storage-creds --from-file=credentials.json
Modify Dgraph server statefulset
The following modifications will be needed to your dgraph server statefulset workload.
Under volumes: add
- name: googleservice
secret:
defaultMode: 420
secretName: dgraph-storage-creds
Under volumeMounts: add
- mountPath: /etc/google
name: googleservice
readOnly: true
Add
env:
- name: GOOGLE_APPLICATION_CREDENTIALS
value: /etc/google/credentials.json
Add
lifecycle:
postStart:
exec:
command:
- gcsfuse
- -o
- nonempty
- dgraph_exports
- /dgraph/export
preStop:
exec:
command:
- fusermount
- -u
- /dgraph/export
Remember to change the image to your custom docker image
And add the following argument to the dgraph server command, change the IP range to your cluster range.
--whitelist 10.4.0.1:10.4.255.254
Create cronjob
Create a new cronjob based on the following yaml
backup_cron.yaml
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: dgraph-backup
spec:
schedule: "15 02 * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: dgraph-backup
image: artooro/curl
args:
- /bin/sh
- -c
- curl http://dgraph-server.default.svc.cluster.local:8080/admin/export
restartPolicy: OnFailure
Deploy it via
kubectl create -f backup_cron.yaml
Wrap up
Test, verify the export files show up in the storage bucket, and you’re set.
This post assumes the name of your dgraph server service name is dgraph-server.
This is working for us, and I thought I’d share in case it’s a help to others.