How to: Live Load distributed with Kubernetes, Docker or Binary

MichelDiz · April 12, 2019, 6:54pm

Liveloader has the ability to perform a load in a distributed way. This way we avoid having OOM due to high load on a single Alpha Node (Causing congestion). In this text I will demonstrate how you can use this feature via Kubernetes.

This same process works for Docker, Docker Swarm, Bare Metal and so on. Of course, in each one is slightly different.

The reference Yaml is https://github.com/dgraph-io/dgraph/blob/master/contrib/config/kubernetes/dgraph-ha.yaml

In this small tutorial I’m assuming that you already have a k8s context running. Be it GKE, Docker k8s or even Minikube.

First create the setup:

kubectl create -f dgraph-ha.yaml

After that you need to create services exposing each gRPC from Alphas. Because load in a distributed way is not possible through Load Balance (as it is). So you need to expose each Alpha temporarily.

Create a expose_dgraph.yaml

apiVersion: v1
kind: Service
metadata:
  name: dgraph-zero-0-grpc-public
  labels:
    app: dgraph-zero
spec:
  type: LoadBalancer
  ports:
  - port: 5080
    targetPort: 5080
    name: zero-grpc
  selector:
    statefulset.kubernetes.io/pod-name: dgraph-zero-0
---
apiVersion: v1
kind: Service
metadata:
  name: dgraph-alpha-0-grpc-public
  labels:
    app: dgraph-alpha
spec:
  type: LoadBalancer
  ports:
  - port: 9080
    targetPort: 9080
    name: alpha-grpc
  selector:
    statefulset.kubernetes.io/pod-name: dgraph-alpha-0
---
apiVersion: v1
kind: Service
metadata:
  name: dgraph-alpha-1-grpc-public
  labels:
    app: dgraph-alpha
spec:
  type: LoadBalancer
  ports:
  - port: 9080
    targetPort: 9080
    name: alpha-grpc
  selector:
    statefulset.kubernetes.io/pod-name: dgraph-alpha-1
---
apiVersion: v1
kind: Service
metadata:
  name: dgraph-alpha-2-grpc-public
  labels:
    app: dgraph-alpha
spec:
  type: LoadBalancer
  ports:
  - port: 9080
    targetPort: 9080
    name: alpha-grpc
  selector:
    statefulset.kubernetes.io/pod-name: dgraph-alpha-2
---

Run:

kubectl create -f expose_dgraph.yaml

This will create a public gRPC endpoint and when your cloud service finishes creating the Endpoints/IPs you can use it to start your load.

./dgraph live -f your.rdf -s your.schema -z 38.239.155.115:5080 
-a "33.63.128.148:9080,35.138.12.15:9080,35.225.60.18:9080"

Each URI/IP must be comma separated and can not contain spaces.

After load you need to remove the services that exposes Dgraph:

kubectl delete -f expose_dgraph.yaml

BTW, It is recommended that you modify dgraph-ha.yaml according to your needs and not expose the Dgraph directly to the public. dgraph-ha.yaml exposes by default the Dgraph via load balance. Keep that in mind.

PS. After the publication of this topic. We gonna work in a deep and technical blog post for this.

Cheers.

dmai · April 12, 2019, 9:56pm

Kubernetes and other load balancers should have a way to simply connect to a single address and get the expected load balanced behavior. I did a quick search for resources but haven’t dug deep into reading them:

gRPC Load Balancing on Kubernetes without Tears | Kubernetes. Says something about Linkerd / service mesh.
grpc/load-balancing.md at master · grpc/grpc · GitHub says the load balancer give the client a list of valid addresses to connect to.
https://grpc.io/blog/loadbalancing there’s also this blog post which I haven’t read yet and looks like it contains different info from the GitHub doc.

Alternatively, we can use a Kubernetes Job to run the live loader. The Job would avoid the need to set up external IP addresses for each Alpha temporarily, which has a number of drawbacks, including publically exposing it, adding extra cost for public IPs from the cloud provider, and even needing to provision the public IPs. (For example, in AWS there are limits to the number of public IP addresses you can have at any given time, and this limit can be bumped up by filing a support ticket but that requires extra time and thus is another roadblock).

Here’s a Job config I’ve used before to load the 21million data set into a cluster. The live loader connects via the internal cluster DNS rules:

apiVersion: batch/v1
kind: Job
metadata:
  name: dgraph-live-load-21million
spec:
  template:
    spec:
      containers:
      - name: dgraph-live-load-21million
        image: dgraph/dgraph:latest
        command:
          - bash
          - "-c"
          - |
            set -x
            apt-get update
            apt-get install -y wget file
            wget -q -O 21million.rdf.gz  "https://github.com/dgraph-io/benchmarks/raw/release/v1.0/data/release/21million.rdf.gz"
            wget -q -O facets.rdf.gz -q  "https://github.com/dgraph-io/benchmarks/raw/release/v1.0/data/release/facets.rdf.gz"
            wget -q -O sf-tourism.rdf.gz "https://github.com/dgraph-io/benchmarks/raw/release/v1.0/data/release/sf-tourism.rdf.gz"
            wget -q -O release.schema    "https://raw.githubusercontent.com/dgraph-io/benchmarks/release/v1.0/data/release/release.schema"
            dgraph live \
              -r 21million.rdf.gz,facets.rdf.gz,sf-tourism.rdf.gz \
              -s release.schema \
              --zero dgraph-zero-0.dgraph-zero:5080 \
              --dgraph dgraph-alpha-0.dgraph-alpha:9080,dgraph-alpha-1.dgraph-alpha:9080,dgraph-alpha-2.dgraph-alpha:9080
      restartPolicy: Never
  backoffLimit: 4

MichelDiz · April 12, 2019, 10:27pm

humm, I do something similar here no need to use wget and I don’t use Kubernetes Job. It’s insteresting tho. But “my way” works fine.

That’s something to care about. But I guess GKE is more friendly for this.

My proposal is to use Liveloader locally Pointing to a Cluster. But this is not the main proposal here. What matters here is to share the distributed load information. So all possibilities are valid. In the “Dgraph-Bulk-Script” repository I already predict the Live or Bulk usage within the Cluster context (GKE), but using kubectl copy command and “sleep” (sh). Not elegant but functional.

Not sure, as I said in the last topic I marked you. LiveLoad gets stuck with an Alpha only. Maybe there is another solution, but all indicates that only exposing the gRPC of the Alphas works.

I’ll check the other links. See if I can dig goodies.

dmai · April 12, 2019, 10:29pm

The “gRPC Load Balancing on Kubernetes Without Tears” () blog post sounds exactly like the issue we’re trying to solve.

dmai · April 16, 2019, 5:25pm

Hooking up Linkerd2 to a Dgraph cluster was fairly painless, and clients connecting via the single ClusterIP or LoadBalancer endpoint would end up talking to different Alphas based on a exponentially weighted moving average (Load Balancing | Linkerd).

The client needed to talk to Linkerd too. e.g., After setting up a Dgraph cluster in K8s with Linkerd (I followed their Getting Started guide), I ran a dgraph pod for the dgraph increment tool and connected it to linkerd via linkerd inject:

kubectl run dgraph-increment --image=dgraph/dgraph:v1.0.14 --restart=Never --dry-run -o yaml --command -- sleep 3600 | linkerd inject - | k apply -f -

Then I could run the increment tool against the K8s LoadBalancer named dgraph-alpha-public:

dgraph increment --addr=dgraph-alpha-public:9080 --num=1000

and—with Alpha query logs enabled—I’d see different Alphas respond back to queries (beginning of each log line):

dgraph-alpha-0 alpha I0416 17:22:23.477168       1 server.go:446] Got a query: query:"{ q(func: has(counter.val)) { uid, val: counter.val }}"
dgraph-alpha-1 alpha I0416 17:22:23.571169       1 server.go:446] Got a query: query:"{ q(func: has(counter.val)) { uid, val: counter.val }}"
dgraph-alpha-1 alpha I0416 17:22:23.668939       1 server.go:446] Got a query: query:"{ q(func: has(counter.val)) { uid, val: counter.val }}"
dgraph-alpha-1 alpha I0416 17:22:23.761325       1 server.go:446] Got a query: query:"{ q(func: has(counter.val)) { uid, val: counter.val }}"
dgraph-alpha-0 alpha I0416 17:22:23.859834       1 server.go:446] Got a query: query:"{ q(func: has(counter.val)) { uid, val: counter.val }}"
dgraph-alpha-1 alpha I0416 17:22:23.949468       1 server.go:446] Got a query: query:"{ q(func: has(counter.val)) { uid, val: counter.val }}"
dgraph-alpha-1 alpha I0416 17:22:24.035094       1 server.go:446] Got a query: query:"{ q(func: has(counter.val)) { uid, val: counter.val }}"
dgraph-alpha-0 alpha I0416 17:22:24.196147       1 server.go:446] Got a query: query:"{ q(func: has(counter.val)) { uid, val: counter.val }}"
dgraph-alpha-2 alpha I0416 17:22:24.289699       1 server.go:446] Got a query: query:"{ q(func: has(counter.val)) { uid, val: counter.val }}"
dgraph-alpha-2 alpha I0416 17:22:24.384236       1 server.go:446] Got a query: query:"{ q(func: has(counter.val)) { uid, val: counter.val }}"
dgraph-alpha-2 alpha I0416 17:22:24.456025       1 server.go:446] Got a query: query:"{ q(func: has(counter.val)) { uid, val: counter.val }}"

This also works with the headless ClusterIP Service address that’s not publically exposed (dgraph-alpha), after configuring it to expose port 9080 for clients:

dgraph increment --addr=dgraph-alpha:9080 --num=1000

There’s this open issue with Linkerd about not being able to connect to a particular StatefulSet pod via their stable DNS name: Can't Reach StatefulSet Pods Via Stable Network ID · Issue #2266 · linkerd/linkerd2 · GitHub. I went on the Linkerd Slack channel to ask about its progress and it’s not fixed yet , and Oliver Gould said “it’s probably best to avoid injecting client that talk to stable network ids”. So for now when proxying Dgraph through Linkerd the internal ports 5080 and 7080 cannot go through Linkerd so that e.g. dgraph-alpha-0.dgraph-alpha:5080 can talk to dgraph-alpha-1.dgraph-alpha:5080 and vice versa, and the same for the Zero pods.

 cat dgraph-ha.yaml | linkerd inject --skip-inbound-ports 5080,7080 --skip-outbound-ports 5080,7080 - | k apply -f -

MichelDiz · May 3, 2019, 2:05am

up +

PS. After the publication of this topic. We gonna work in a deep and technical blog post for this.

joaquin · July 9, 2021, 7:20am

@MichelDiz Two years late, but working on a blog article on this topic. I am currently dissecting installation w/ linkerd cli vs. helm chart, and noting the differences.

Also, for the helm chart, because linkerd simply inserts annotations in deploy or sts resources, I was thinking of exposing annotations in the helm chart, so it can easily be added in the helm charts, allowing users to continue using helm charts, without need to do a helm template. When I am done, I can drop (an|some) example(s).

@MichelDiz @dmai Added linkerd support (or any annotation) to the helm chart:

add template annotation support for sts/deploy to allow linkerd or ot… by darkn3rd · Pull Request #71 · dgraph-io/charts · GitHub

Topic		Replies	Views
Load Data using Dgraph w/ Kubernetes Users kind:question , area:bulk-loader , area:kubernetes	8	1325	March 24, 2024
Load Balancing in Kubernetes environment Dgraph	2	423	March 22, 2021
Dgraph Bulk load on version 20.07.02 Dgraph kind:question , bulkloader	3	958	November 9, 2020
How to Use Live Loader in a more convenient way? Dgraph kind:question	8	1243	January 22, 2021
Load Balancing via a headless service Dgraph kind:question	3	2211	November 18, 2020

How to: Live Load distributed with Kubernetes, Docker or Binary

Related topics