Dgraph Helm Data Migration

Hi,

I am Running Dgraph on baremetal on version 20.07.3. Now I want to migrate to helm with same version.
I started cluster with helm 0.13v chart. Cluster is Working Fine.

Is there any thing like We need to use particular chart for particular dgraph version?

Now my question is, how to replace my existing data which is on baremetal to data which is on helm.

Thanks

Do you mean start the cluster with 20.07.3 instead of latest?

You can do

helm install my-release dgraph/dgraph --set image.tag="v20.07.3"

Export it and use Liveload to migrate.

https://dgraph.io/docs/deploy/dgraph-administration/#exporting-database

Hi Michel,

I Have One Doubt, In bare metal i am running V20.07.3, In Helm also the same version - why need to live load again. Copy/paste data directories won’t work in helm?

It should.

But in my case it is showing error. Alpha pod is crashing.
It is showing failed to create badger…

Doing migrations with the files itself is a bit different and not safe. That’s why we don’t recommend this approach in docs - But it is the same as the bulkload approach(which is copy paste the output). In general, if the versions are the same. You should try to start the helm cluster without any dgraph instance running. That’s the challenger. And also clean any artifact left(files like locks, postings, and so on). After you copy the files to the volume, you could start the cluster.

Thanks Michel.
So we can do bulk load with data & schema files in helm charts.
If possible then how to do.

I don’t know. In a usual k8s cluster, you can just do cp to the volume. Like I did here GitHub - MichelDiz/Dgraph-Bulk-Script: Just a simple Sh to use Dgraph's Bulk Loader. - but the helm chart is different and I never tried to do the same. @joaquin can help there.

In my repo, what I did was just add a script to “wait” for the files. And then start the cluster.

1 Like

Yeah It is working fine with Kubernetes. When I repeat the same thing copy/paste in helm, I am facing error.

Thanks Michel for continuous support.

1 Like

Bulk Loader is an offline process that needs to happen before Dgraph Alpha starts up (and thus where there is no existing p directory). In Kubernetes, you achieve this with an init container. I documented the process with bulk loader with Kubernetes using helm chart for deployment below:

Note on the process I documented, this is an easy approach, where bulk loader operation is run redundantly inside each of the Dgraph Alpha pod’s init container. The bulk load really only needs to be done once to create the p directory. So, if you did bulk load somewhere else, let’s call this the bulk load workstation, you can then just copy the resulting p directory from the bulk load workstation to the Dgraph Alpha pod’s init container as they startup up sequentially.

Another approach is to use live loader, which is an online process where the Dgraph Alpha container is already running with an existing p directory. For this, you can use kubectl port-forward command to map the necessary ports to localhost. Let me know if you want help with the live loader process.

1 Like

Thanks a lot, will revert back if any query.

I just realized that I dropped these awesome instructions in a private thread, so pasting it below in case others may have similar questions on this topic.


Generally speaking for deploying dgraph with helm, you would use an init container where you can populate the p directory before Dgraph alpha pods start.

NOTE: The process below assumes a single shard (group 1) cluster, using the default helm chart values that create alpha-0, alpha-1, alpha-2.

Overview of Helm Chart is here:

Process

Do this process for every alpha pod, e.g. alpha-0, alpha-1, alpha-2:

  1. Deploy Dgraph with initContainers enabled (the other init container default settings are unchanged):
    helm repo add dgraph https://charts.dgraph.io
    helm install "my-release" \
      --set alpha.initContainers.init.enabled=true dgraph/dgraph
    
  2. Optionally, copy files to the Dgraph Alpha pod’s init container with kubectl cp if needed, e.g.
    kubectl cp /path/to/files <name-of-pod-for-alpha>:/dgraph/ -c <name-of-init-container>
    
  3. Login into the desired Dgraph Alpha pod’s init container:
    kubectl exec -ti <name-of-pod-for-alpha> -c <name-of-init-container> -- bash
    
  4. Inside the Dgraph Alpha pod’s init container, if you didn’t copy required files into the container already, you can do that now or optionally curl down needed files. Once ready, run dgraph bulk and then move the resulting directory to the appropriate path, e.g. mv /path/to/p /dgraph.
  5. Inside the Dgraph Alpha pod’s init container, run touch /dgraph/doneinit to signal that we’re ready to start Dgraph Alpha container and no longer need the init container.
  6. Repeat this process for all other alpha pods within this group. You may have to way a few seconds for the initial init container to become available.

Notes

The name of the pods will follow this format below depending on the release name, such as my-release:

  • my-release-dgraph-alpha-0
  • my-release-dgraph-alpha-1
  • my-release-dgraph-alpha-2

You can always list the pod names with kubectl get pods. For the init container name, it will also be based on the release name, e.g. my-release:

  • my-release-dgraph-alpha-init

Thus putting these together you could exec into the init container on alpha-0 with:

kubectl exec -ti my-release-dgraph-alpha-0 -c my-release-dgraph-alpha-init -- bash

You can always get the name of the containers and init containers running in a pod with kubectl describe <name-of-pod>

Let me know if you need further help.

1 Like

One important piece of information related to this matter. You should always use the same zero group in all these steps. Cuz if you start the zero group from scratch, it will lead to conflicting uid leasing and timestamps. Pay attention that the Alphas doesn’t hold any information that belongs to the Zero Group.

Two things you can choose to do.

  1. Start the Zero group (without the alphas) in k8s and do the bulkload locally if the case, and you’re good to go just copying it to the pods.
  2. Start both zero and bulk locally and copy the zero files to the k8s right after.