Having issues with binary backups


(Simdi Jinkins) #1

I start up a clean new dgraph cluster and load about 60,000 nodes into it, then i take a binary backup for the first time but it’s seems to be creating folders only a few bytes in size, which is ridiculous:


(Shekar Mantha) #2

Hi,

Would you please tell me how you did the backup? What commands you ran, etc. Also, what system you are running on?

Thanks
Shekar


(Simdi Jinkins) #3

I’m running dgraph on a single node, single replica kubernetes cluster. So i’ve got this:
image

To create a binary backup i do the following steps:

  1. Create an interactive shell in the alpha pod with the following command:
 kubectl exec -it dgraph-alpha-0 -- /bin/bash
  1. Then i run the backup command as stated in the docs:
curl -XPOST localhost:8080/admin/backup -d "destination=s3://s3.us-east-2.amazonaws.com/<bucket-name>"

This command returns successful, and when i check the amazon bucket a folder is created as seen in my initial comment.

  1. Then to test if the backup capabilites are working i delete all the indexes using the ratel dashboard, and run the following command to restore it:
dgraph restore -p /var/db/dgraph -l s3://s3.us-east-2.amazonaws.com/<bucket-name>

but this command fails with the following error:

i try fixing the error by creating the missing directories, and on retrying the restore backup command, it’s successful:

Please note before i ran the backup there where over 80,000 nodes.


(Simdi Jinkins) #4

I’m running this kubernetes cluster on digital ocean, here’s my .yaml config :slight_smile:

# This highly available config creates 3 Dgraph Zeros, 3 Dgraph
# Alphas with 3 replicas, and 1 Ratel UI client. The Dgraph cluster
# will still be available to service requests even when one Zero
# and/or one Alpha are down.
#
# There are 4 public services exposed, users can use:
#       dgraph-zero-public - To load data using Live & Bulk Loaders
#       dgraph-alpha-public - To connect clients and for HTTP APIs
#       dgraph-ratel-public - For Dgraph UI
#       dgraph-alpha-x-http-public - Use for debugging & profiling
# apiVersion: v1
# kind: Service
# metadata:
#   name: dgraph-zero-public
#   labels:
#     app: dgraph-zero
# spec:
#   type: LoadBalancer
#   ports:
#   - port: 5080
#     targetPort: 5080
#     name: zero-grpc
#   - port: 6080
#     targetPort: 6080
#     name: zero-http
#   selector:
#     app: dgraph-zero
# ---
apiVersion: v1
kind: Service
metadata:
  name: dgraph-alpha-public
  labels:
    app: dgraph-alpha
spec:
  type: LoadBalancer
  ports:
  - port: 8080
    targetPort: 8080
    name: alpha-http
  - port: 9080
    targetPort: 9080
    name: alpha-grpc
  selector:
    app: dgraph-alpha
---
# This service is created in-order to debug & profile a specific alpha.
# You can create one for each alpha that you need to profile.
# For a more general HTTP APIs use the above service instead.
# apiVersion: v1
# kind: Service
# metadata:
#   name: dgraph-alpha-0-http-public
#   labels:
#     app: dgraph-alpha
# spec:
#   type: LoadBalancer
#   ports:
#   - port: 8080
#     targetPort: 8080
#     name: alpha-http
#   selector:
#     statefulset.kubernetes.io/pod-name: dgraph-alpha-0
# ---
# apiVersion: v1
# kind: Service
# metadata:
#   name: dgraph-ratel-public
#   labels:
#     app: dgraph-ratel
# spec:
#   type: LoadBalancer
#   ports:
#   - port: 8000
#     targetPort: 8000
#     name: ratel-http
#   selector:
#     app: dgraph-ratel
# ---
# This is a headless service which is necessary for discovery for a dgraph-zero StatefulSet.
# https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#creating-a-statefulset
apiVersion: v1
kind: Service
metadata:
  name: dgraph-zero
  labels:
    app: dgraph-zero
spec:
  ports:
  - port: 5080
    targetPort: 5080
    name: zero-grpc
  clusterIP: None
  selector:
    app: dgraph-zero
---
# This is a headless service which is necessary for discovery for a dgraph-alpha StatefulSet.
# https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#creating-a-statefulset
apiVersion: v1
kind: Service
metadata:
  name: dgraph-alpha
  labels:
    app: dgraph-alpha
spec:
  ports:
  - port: 7080
    targetPort: 7080
    name: alpha-grpc-int
  clusterIP: None
  selector:
    app: dgraph-alpha
---
# This StatefulSet runs 3 Dgraph Zero.
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: dgraph-zero
spec:
  serviceName: "dgraph-zero"
  replicas: 1
  selector:
    matchLabels:
      app: dgraph-zero
  template:
    metadata:
      labels:
        app: dgraph-zero
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - dgraph-zero
              topologyKey: kubernetes.io/hostname
      containers:
      - name: zero
        image: dgraph/dgraph:latest
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 5080
          name: zero-grpc
        - containerPort: 6080
          name: zero-http
        volumeMounts:
        - name: datadir
          mountPath: /dgraph
        env:
          - name: POD_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
        envFrom:
        - configMapRef:
            name: env-values
        command:
          - bash
          - "-c"
          - |
            set -ex
            [[ `hostname` =~ -([0-9]+)$ ]] || exit 1
            ordinal=${BASH_REMATCH[1]}
            idx=$(($ordinal + 1))
            if [[ $ordinal -eq 0 ]]; then
              exec dgraph zero --my=$(hostname -f):5080 --idx $idx --replicas 1
            else
              exec dgraph zero --my=$(hostname -f):5080 --peer dgraph-zero-0.dgraph-zero.${POD_NAMESPACE}.svc.cluster.local:5080 --idx $idx --replicas 1
            fi
      terminationGracePeriodSeconds: 60
      volumes:
      - name: datadir
        persistentVolumeClaim:
          claimName: datadir
  updateStrategy:
    type: RollingUpdate
  volumeClaimTemplates:
  - metadata:
      name: datadir
      annotations:
        volume.alpha.kubernetes.io/storage-class: anything
    spec:
      accessModes:
        - "ReadWriteOnce"
      resources:
        requests:
          storage: 10Gi
---
# This StatefulSet runs 3 replicas of Dgraph Alpha.
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: dgraph-alpha
spec:
  serviceName: "dgraph-alpha"
  replicas: 1
  selector:
    matchLabels:
      app: dgraph-alpha
  template:
    metadata:
      labels:
        app: dgraph-alpha
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - dgraph-alpha
              topologyKey: kubernetes.io/hostname
      # Initializing the Alphas:
      #
      # You may want to initialize the Alphas with data before starting, e.g.
      # with data from the Dgraph Bulk Loader: https://docs.dgraph.io/deploy/#bulk-loader.
      # You can accomplish by uncommenting this initContainers config. This
      # starts a container with the same /dgraph volume used by Alpha and runs
      # before Alpha starts.
      #
      # You can copy your local p directory to the pod's /dgraph/p directory
      # with this command:
      #
      #    kubectl cp path/to/p dgraph-alpha-0:/dgraph/ -c init-alpha
      #    (repeat for each alpha pod)
      #
      # When you're finished initializing each Alpha data directory, you can signal
      # it to terminate successfully by creating a /dgraph/doneinit file:
      #
      #    kubectl exec dgraph-alpha-0 -c init-alpha touch /dgraph/doneinit
      #
      # Note that pod restarts cause re-execution of Init Containers. Since
      # /dgraph is persisted across pod restarts, the Init Container will exit
      # automatically when /dgraph/doneinit is present and proceed with starting
      # the Alpha process.
      #
      # Tip: StatefulSet pods can start in parallel by configuring
      # .spec.podManagementPolicy to Parallel:
      #
      #     https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#deployment-and-scaling-guarantees
      #
      # initContainers:
      #   - name: init-alpha
      #     image: dgraph/dgraph:latest
      #     command:
      #       - bash
      #       - "-c"
      #       - |
      #         echo "Write to /dgraph/doneinit when ready."
      #         until [ -f /dgraph/doneinit ]; do sleep 2; done
      #     volumeMounts:
      #       - name: datadir
      #         mountPath: /dgraph
      containers:
      - name: alpha
        image: dgraph/dgraph:latest
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 7080
          name: alpha-grpc-int
        - containerPort: 8080
          name: alpha-http
        - containerPort: 9080
          name: alpha-grpc
        volumeMounts:
        - name: datadir
          mountPath: /dgraph
        env:
          # This should be the same namespace as the dgraph-zero
          # StatefulSet to resolve a Dgraph Zero's DNS name for
          # Alpha's --zero flag.
          - name: POD_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
        envFrom:
        - configMapRef:
            name: env-values
        command:
          - bash
          - "-c"
          - |
            set -ex
            dgraph alpha --my=$(hostname -f):7080 --lru_mb 1340 --zero dgraph-zero-0.dgraph-zero.${POD_NAMESPACE}.svc.cluster.local:5080
      terminationGracePeriodSeconds: 600
      volumes:
      - name: datadir
        persistentVolumeClaim:
          claimName: datadir
  updateStrategy:
    type: RollingUpdate
  volumeClaimTemplates:
  - metadata:
      name: datadir
      annotations:
        volume.alpha.kubernetes.io/storage-class: anything
    spec:
      accessModes:
        - "ReadWriteOnce"
      resources:
        requests:
          storage: 20Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: dgraph-ratel
  labels:
    app: dgraph-ratel
spec:
  selector:
    matchLabels:
      app: dgraph-ratel
  template:
    metadata:
      labels:
        app: dgraph-ratel
    spec:
      containers:
      - name: ratel
        image: dgraph/dgraph:latest
        ports:
        - containerPort: 8000
        command:
          - dgraph-ratel

(Shekar Mantha) #5

Ok, thanks. I will have one of our engineers take a look.


(Simdi Jinkins) #6

Hey shekar has there been any progress on this ?. I need to go into production in a week and i can’t get this working


(Shekar Mantha) #7

Hi, sorry about the delay. I will get you some response by Monday.


(Martin Martinez Rivera) #8

Taking a look right now. At first glance, this looks like an issue causing some or all of the backup files to not be written properly to the S3 buckets (so issues during the restore are expected). We have customers that have successfully done backups in S3 and system tests that are passing so hopefully this is just some kind of configuration issue that can be resolved without having to release a new version of Dgraph.

In the meantime, I just want to point out that we support exporting Dgraph data to RDF or JSON, so there’s an alternative way to backup data. It’s got the disadvantage that there’s no incremental update support but hopefully it’s good enough to unblock your production release if it comes to that.

I’ll take a deeper look and reply again with more info and/or requests.


(Martin Martinez Rivera) #9

Just for reference, I’ll give a short summary of how the backup feature works to give you a better idea of what could go wrong and what info would be helpful to diagnose this better.

  1. When you send a backup request, you are sending it to a particular alpha. This process does not have access to all of the data so it sends request to all the groups to execute a backup of their data.
  2. The leader of each group receives the request, processes the arguments, connects to S3/minio or to a network drive and calls the backup API in badger. Because most of the actual backup work is happening inside Badger and is common to S3 and filesystem backups (which are working as expected), it’s very unlikely that the issue is happening here.
  3. Each group writes their data to the destination independently in a folder of their own. The folder contains a manifest.json file. This file tells information to help create an incremental backup the next time a backup is created. It doesn’t contain any personal or private information so it can be shared with us without any privacy concerns.
  4. Each group reports back to the alpha that received the original backup request. I think there’s an issue with this part since your backup is clearly wrong but Dgraph reported that it succeeded. Hopefully, this is the only issue.

If you could do the following, it would help me further debug the issue. Sharing text files is better than sharing images since the former is searchable:

  1. As stated above, each group leader connects to S3 independently. In practice, this means each dgraph alpha needs to have the proper credentials. The backup is writing to S3 so at least one of the servers has the proper credentials. However, if not all the alphas have the credentials, it could explain the issues you are seeing.
  2. Go into each alpha’s environment (physical machine or container/jail, whatever you are using to run Dgraph) and try to ping the S3 endpoint to rule out any connectivity issues. If possible, you could also try to write a dummy file to the S3 endpoint using Amazon’s tools to rule out any issues with AWS itself.
  3. Create a backup in an empty bucket. Share the logs for each alpha around the time the backup is created (no need to share the entire log for now). EDIT: Also important to share are the structure of the S3 buckets after the backup is completed (name of the folders and the files inside them) and the contents of the manifest.json file. The content of the backup themselves are not needed at this moment. The same applies for the item below.
  4. Make a few dummy changes to your database that you can easily revert. Using the same bucket, create another backup. This should create an incremental backup that contains only the changes you made. Share the logs for all the alphas around the time of this backup as well.

In the meantime, I will try to get a S3 bucket of my own and test this feature on my end. I believe this was done by another engineer in the company before the 1.1 release but it doesn’t hurt to check again. Also, writing a test that tries to backup a large dataset should help spot issues earlier so I’ll try to get it prioritized.


(Simdi Jinkins) #10

okay will do this and get back to you, thanks alot


(Simdi Jinkins) #11

So i retried backup using the steps and precautions you outlined, sadly this didn’t work.
The backup attempt was done on a single node, single replica cluster. So there was only one single alpha to check for credentials. The data was added to dgraph in 3 steps (20k,40k and 60k nodes), with each step being backed up with the binary backup commands. At the end of this i got three different folders on aws:


And here is the structure of each folder:


Here are the corresponding manifest.json files
manifest (2).json (850 Bytes)
manifest (1).json (850 Bytes)
manifest.json (843 Bytes)

here are my alpha logs around the backup time:

dgraph_logs.txt (69.1 KB)
dgraph_logs2.txt (18.8 KB)


(Martin Martinez Rivera) #12

Thanks. I am assuming the images are in the same order as the folders. Let me know if that is not correct.

First thing, the restore problem you were seeing earlier is because the restore folder doesn’t exist. You should create it first for now. The next release of Dgraph should contain a fix to automatically try to create this folder.

Second, the first and second backups look correct but the third one is fairly small. This is correct based on the timestamps of the two backups (19567 vs 19572). However, if you made a lot of changes, the timestamps should have changed by way more than 5.

The problem then is that the backup might not be backing up the most recent version. I think I might have seen something like this before. I’ll look into this. Thankfully, this doesn’t seem to be related to AWS at all so I can do the testing on my own.

Just a couple more requests.

  1. How long after the last mutation did you run the third backup?
  2. What happens if you run another backup sometime after the last mutation (let’s say 10 min)? If that ends up backing up the rest of the data, backup should be usable right now as long as you run it periodically and often enough to ensure not a lot of data is missed from the backups due to this issue.

On my end I’ll look into why the backup timestamp seems to be delayed.


(Simdi Jinkins) #13

Thanks martin.
Yes the images are in the same order as the folders

I ran the backup during mutations. I’ve got a service performing api calls to some external service for data. This data is then stored in dgraph. It’s a continous process. To give you context the data being stored is for search suggestions.
Also I can’t accurately say how long i waited before the last backup, but it was probably around 5 or so minutes

I’m suspecting this issue pops up because i’m using kubernetes, but i don’t have much knowledge on k8.


(Martin Martinez Rivera) #14

I don’t think it’s a Kubernetes issue. The backups are small because Dgraph is instructed to only backup up to a timestamp that is barely greater than the timestamp of the last backup.

I talked to Manish (our CEO) about this problem. He said it’s indeed weird that the timestamps are this way.

Could you try to run some backups again (in an empty bucket)? If you run into the same issue, run a query (any valid query should work) right after the backup and share the manifest of that backup (and the backup before that if it exists) and the result of this query (we only really need the timestamp that is included with the query results).

If binary backups are working fine and you ran the query close enough to the end of the backup, then the timestamp of the query and the latest backup should be about the same (unless you have a a lot of mutations coming in between the two points in time).

Thanks.


(Simdi Jinkins) #15

Okay i’ll do this and get back to you


(Simdi Jinkins) #16

So backups are still having issues, same problem. I deleted all backup files and started with an empty bucket.

Manifest of first backup and query timestamp:

manifest (3).json (842 Bytes)


{
  "data": {
    "node": [
      {
        "uid": "0x21acb"
      }
    ]
  },
  "extensions": {
    "server_latency": {
      "parsing_ns": 29002,
      "processing_ns": 8152545,
      "encoding_ns": 9523,
      "assign_timestamp_ns": 1265348
    },
    "txn": {
      "start_ts": 21603
    }
  }
}

Result of 2nd backup and query timestamp:

manifest (4).json (849 Bytes)


{
  "data": {
    "node": [
      {
        "uid": "0x21acb"
      }
    ]
  },
  "extensions": {
    "server_latency": {
      "parsing_ns": 16821,
      "processing_ns": 15343449,
      "encoding_ns": 9767,
      "assign_timestamp_ns": 1453494
    },
    "txn": {
      "start_ts": 21637
    }
  }
}


(Martin Martinez Rivera) #17

There doesn’t seem to be anything wrong with the timestamps. Here’s the timeline:

  • First backup does a backup up to timestamp 21602.
  • The query afterwards has a timestamp of 21603. This makes perfect sense.
  • Second backup does a backup of the data changed between timestamps 21602 and 21636. Since the difference is pretty small, the resulting backup is expected to be small. The only exception is if your transactions were huge but that doesn’t seem to be the case.
  • The strange part happens here. After you run the query, the timestamp is 21637, which is the next timestamp. So it appears that the timestamp the backup received is indeed correct. The strange part is that you said a lot of changes were made between the first and second backups but neither the backup nor Dgraph itself agrees with this.

Just a few more questions to try to get to the bottom of this.

  • How big are the transactions sent to Dgraph? Are you sending each change in a separate transaction or batching multiple changes and then writing them to Dgraph?
  • Is your code successfully committing the transactions?
  • Are you handling aborted transactions that must be retried correctly? If there’s conflicts in the transactions and it cannot be committed, the timestamp will not increase. Your code should retry the mutation(s).
  • What’s the rate at which you send data to Dgraph? If it’s variable this could just mean there weren’t many changes between the two points. do you have a mechanism to check how much data was supposed to be added between the two backup points?

After looking at this, I don’t think the issue is with backups at all because the timestamps that the backup received seem correct based on the timestamps of the queries done right after. Either there weren’t any changes made between the two backups or the mutations are not succeeding for some reason.


(Simdi Jinkins) #18

Apologies if i didn’t communicate properly, i only ran a very small mutation between the first and second backups - for my latest backup attempt. I made alot of changes in my earlier backup attempts, but not this one


(Simdi Jinkins) #19

I add about 500 nodes per transaction.

Yes my code is successful committing the transactions because i’m able to see the data from the ratel interface.

Not really my transactions are more like fire and forget.


(Simdi Jinkins) #20

Still though even if there aren’t any changes between the two backups. I’m not even able to recover a single node. The backups simply don’t work, even restoring at least the data from the first backup doesn’t work.