Enhancement for FilePath Backups to Create Missing Directory to Avoid Data Loss

Experience Report for Feature Request

When specifying a file destination for backups, you cannot have directories (preferably the last leaf) automatically created. This is needed to keep backups contained in their own directory path, and avoid scenarios where you can have incremental backups without a full backup, and thus data loss.

If this enhancement was implemented, this would allow for easier organization of backups, where full and dependent incremental backups are contiguous within a single customer-specified directory.

As an example, a common scheme amongst customers is to have a daily full backup, and hourly incremental backups, these could be segregated by dgraph_$(date +%Y%m%d).

Without this, there is no easy way for customers to create the directory, as the cronjob may not have access to the mounted disk on the alpha server. In Kubernetes and other immutable infrastructure, this is common, as cronjobs are run on a separate system.

What you wanted to do

As an operator, I want the final leaf of the directory path created so that backups can be segregated. Thus if I have mount point of /dgraph/backups, and the specified destination is /dgraph/backups/dgraph_$(date +%Y%m%d) (similar to minio or s3 functionality), I would want the dgraph_$(date +%Y%m%d) automatically created.

What you actually did

Without the alpha server doing this, the customer has to mount the disk both on the alpha server and the server running the kubernetes cronjob. With this in place, operations can be performed on the disk by the script running the cronjob that is required for the alpha server performing the binary backup.

In Kubernetes, with NFS, this can be configured using a PV with NFS. With other types of persistent volumes or with NFS created through a CSI driver (e.g. AWS EFS or Google Cloud FileStore) this may not be possible.

Why that wasn’t great, with examples

This is an advanced and complex configuration and orchestration scenario that would not be needed with this enhancement.

Any external references to support your case

In either minio or s3, if the path does not exist on the object store, it is created:

  • minio://<server>/<bucket>/<path>/dgraph_$(date +%Y%m%d)
  • s3://s3.<region>.amazonaws.com/<bucket>/<path>/dgraph_$(date +%Y%m%d)