Load Data using Dgraph w/ Kubernetes

Hi Everyone,
I am new to DGraph and was trying to do the bulk loader in kubernetes. I found the yml file from the github https://raw.githubusercontent.com/dgraph-io/dgraph/master/contrib/config/kubernetes/dgraph-ha/dgraph-ha.yaml. In this yml file in the comments sections it is mentioned to do the bulk load we can create an init container and copy the data from our local repository into pod’s repository dgraph repository. So I have downloaded the 1million.rdf.gz file form the github and tried loading it into the dgraph but I am unable to do so. I had to modify the Statefulset service to deployment. My alpha deployment looks likes this:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    io.kompose.service: alpha
  name: alpha
spec:
  replicas: 1
  selector:
    matchLabels:
      io.kompose.service: alpha
  template:
    metadata:
      labels:
        io.kompose.service: alpha
    spec:
      initContainers:
        - name: init-alpha
          image: dgraph/dgraph:latest
          command:
            - bash
            - "-c"
            - |
              trap "exit" SIGINT SIGTERM
              echo "Write to /dgraph/doneinit when ready."
              until [ -f /dgraph/doneinit ]; do sleep 2; done
          volumeMounts:
            - name: alpha-claim0
              mountPath: /dgraph
      containers:
      - args:
        - dgraph
        - alpha
        - --my=alpha:7080
        - --lru_mb=2048
        - --zero=zero:5080
        image: dgraph/dgraph:latest
        name: alpha
        ports:
        - containerPort: 8080
        - containerPort: 9080
        volumeMounts:
        - mountPath: /dgraph
          name: alpha-claim0
      volumes:
      - name: alpha-claim0
        persistentVolumeClaim:
          claimName: alpha-claim0
status: {}

After I deploy using the above mentioned file I use the following commands to copy my data file. I have made a p directory where i have included my 100million.rdf.gz file.

kubectl cp /root/dgraph/docker-compose/p myalphapodname:/dgraph/ -c init-alpha

kubectl exec myalphapodname -c init-alpha touch /dgraph/doneinit

But I am unable to see my data in the ratel UI. Am I missing something? If this is a silly question please forgive me. I am trying kubernetes and dgraph first time in my life and I need to deliver it asap. I did all the research before posting here. Please if anyone could help me with this that would really be great.

Thank you

Hey @saugat

No question is too silly. We all learn thru asking questions. Welcome to the community by the way.

Quick one: is your Ratel connected to the correct IP/port? Have you done the appropriate whitelisting ?

Also, @joaquin, I had a quick look thru of things, and couldn’t spot any mistakes. Ideas?

@chewxy
Thank you for saying there is no silly questions. So I had literally put down the 100million.rdf.gz file in my p directory on further research I got to know i need to run the some bulkloader commands

dgraph bulk -r goldendata.rdf -s goldendata.schema --http localhost:8090 --zero localhost:5080

and this will create the out/0/p and need to copy this p folder into alpha … If this is right process where do I need to run the above command in my zero ?

Um, I think you need the live loader, not the bulk loader. - Bulk loader is for before you have the cluster up and running. But once it’s up and running you should use the live loader

Yeah, Liveloader should be the chosen one for this task. But you can also use Bulkloader, in that case, you have to start the Alphas after the bulk. You can’t start the bulkload with Alphas running. Just the zero, also, you have to preserve the Zero instance (never delete the Zero volume).

I believe that if you are running inside a pod, you should use zero:5080 or something similar coming from the SVC.

One more thing, if you gonna use the Liveloader, you have to check your provider. If it is AWS, GCP or similar you have to expose the Alpha and the Zero gRPCs ports. Unless you do the liveloader inside a pod.

Hi @saugat

I wouldn’t recommend using a Deployment controller, as the nodes are not sticky and will use a randomly named pod. StatefulSets are useful for stateful apps with zero and alpha.

A quick way to get started w/ K8S would be to use helm chart and then use live loader:

RELEASE="my-release"
helm repo add dgraph https://charts.dgraph.io
helm install $RELEASE dgraph/dgraph

After all the pods are up, you can use port forward to make them available on localhost.

RELEASE="my-release"
export RATEL_POD=$(kubectl get pods \
  --selector "component=ratel,release=$RELEASE" \
  --output jsonpath="{.items[0].metadata.name}"
)
export ALPHA_POD=$(kubectl get pods \
  --selector "statefulset.kubernetes.io/pod-name=alpha-0,release=$RELEASE" \
  --output jsonpath="{.items[0].metadata.name}"
) 
export ZERO_POD=$(kubectl get pods \
  --selector "statefulset.kubernetes.io/pod-name=zero-0,release=$RELEASE" 
  --output jsonpath="{.items[0].metadata.name}"
)

## For Ratel (HTTP)
kubectl port-forward $ALPHA_POD 8080:8080 &
kubectl port-forward $RATEL_POD 8000:8000 &

## For LiveLoader (GRPC)
kubectl port-forward $ALPHA_POD 9080:9080 &
kubectl port-forward $ZERO_POD 5080:5080 &

With these available at localhost, you can use live loader:

dgraph live \
 --files 1million.rdf.gz \
 --schema 1million.schema \
 --alpha localhost:9080 \
 --zero localhost:5080

And also you can visit http://localhost:8000 for Ratel. When configuring which alpha to use, select http://localhost:8080.

Hope that helps.

Joaquin