How to: Live Load distributed with Kubernetes, Docker or Binary

dataset
liveload

(Michel Conrado) #1

Liveloader has the ability to perform a load in a distributed way. This way we avoid having OOM due to high load on a single Alpha Node (Causing congestion). In this text I will demonstrate how you can use this feature via Kubernetes.

This same process works for Docker, Docker Swarm, Bare Metal and so on. Of course, in each one is slightly different.

The reference Yaml is https://github.com/dgraph-io/dgraph/blob/master/contrib/config/kubernetes/dgraph-ha.yaml

In this small tutorial I’m assuming that you already have a k8s context running. Be it GKE, Docker k8s or even Minikube.

First create the setup:

kubectl create -f dgraph-ha.yaml

After that you need to create services exposing each gRPC from Alphas. Because load in a distributed way is not possible through Load Balance (as it is). So you need to expose each Alpha temporarily.

Create a expose_dgraph.yaml

apiVersion: v1
kind: Service
metadata:
  name: dgraph-zero-0-grpc-public
  labels:
    app: dgraph-zero
spec:
  type: LoadBalancer
  ports:
  - port: 5080
    targetPort: 5080
    name: zero-grpc
  selector:
    statefulset.kubernetes.io/pod-name: dgraph-zero-0
---
apiVersion: v1
kind: Service
metadata:
  name: dgraph-alpha-0-grpc-public
  labels:
    app: dgraph-alpha
spec:
  type: LoadBalancer
  ports:
  - port: 9080
    targetPort: 9080
    name: alpha-grpc
  selector:
    statefulset.kubernetes.io/pod-name: dgraph-alpha-0
---
apiVersion: v1
kind: Service
metadata:
  name: dgraph-alpha-1-grpc-public
  labels:
    app: dgraph-alpha
spec:
  type: LoadBalancer
  ports:
  - port: 9080
    targetPort: 9080
    name: alpha-grpc
  selector:
    statefulset.kubernetes.io/pod-name: dgraph-alpha-1
---
apiVersion: v1
kind: Service
metadata:
  name: dgraph-alpha-2-grpc-public
  labels:
    app: dgraph-alpha
spec:
  type: LoadBalancer
  ports:
  - port: 9080
    targetPort: 9080
    name: alpha-grpc
  selector:
    statefulset.kubernetes.io/pod-name: dgraph-alpha-2
---

Run:

kubectl create -f expose_dgraph.yaml

This will create a public gRPC endpoint and when your cloud service finishes creating the Endpoints/IPs you can use it to start your load.

./dgraph live -f your.rdf -s your.schema -z 38.239.155.115:5080 
-a "33.63.128.148:9080,35.138.12.15:9080,35.225.60.18:9080"

Each URI/IP must be comma separated and can not contain spaces.

After load you need to remove the services that exposes Dgraph:

kubectl delete -f expose_dgraph.yaml

BTW, It is recommended that you modify dgraph-ha.yaml according to your needs and not expose the Dgraph directly to the public. dgraph-ha.yaml exposes by default the Dgraph via load balance. Keep that in mind.

PS. After the publication of this topic. We gonna work in a deep and technical blog post for this.

Cheers.


Kubenatas Cluster Zero sometime will be Unhealthy connection
(Daniel Mai) #2

Kubernetes and other load balancers should have a way to simply connect to a single address and get the expected load balanced behavior. I did a quick search for resources but haven’t dug deep into reading them:

Alternatively, we can use a Kubernetes Job to run the live loader. The Job would avoid the need to set up external IP addresses for each Alpha temporarily, which has a number of drawbacks, including publically exposing it, adding extra cost for public IPs from the cloud provider, and even needing to provision the public IPs. (For example, in AWS there are limits to the number of public IP addresses you can have at any given time, and this limit can be bumped up by filing a support ticket but that requires extra time and thus is another roadblock).

Here’s a Job config I’ve used before to load the 21million data set into a cluster. The live loader connects via the internal cluster DNS rules:

apiVersion: batch/v1
kind: Job
metadata:
  name: dgraph-live-load-21million
spec:
  template:
    spec:
      containers:
      - name: dgraph-live-load-21million
        image: dgraph/dgraph:latest
        command:
          - bash
          - "-c"
          - |
            set -x
            apt-get update
            apt-get install -y wget file
            wget -q -O 21million.rdf.gz  "https://github.com/dgraph-io/benchmarks/raw/release/v1.0/data/release/21million.rdf.gz"
            wget -q -O facets.rdf.gz -q  "https://github.com/dgraph-io/benchmarks/raw/release/v1.0/data/release/facets.rdf.gz"
            wget -q -O sf-tourism.rdf.gz "https://github.com/dgraph-io/benchmarks/raw/release/v1.0/data/release/sf-tourism.rdf.gz"
            wget -q -O release.schema    "https://raw.githubusercontent.com/dgraph-io/benchmarks/release/v1.0/data/release/release.schema"
            dgraph live \
              -r 21million.rdf.gz,facets.rdf.gz,sf-tourism.rdf.gz \
              -s release.schema \
              --zero dgraph-zero-0.dgraph-zero:5080 \
              --dgraph dgraph-alpha-0.dgraph-alpha:9080,dgraph-alpha-1.dgraph-alpha:9080,dgraph-alpha-2.dgraph-alpha:9080
      restartPolicy: Never
  backoffLimit: 4

(Michel Conrado) #3

humm, I do something similar here no need to use wget and I don’t use Kubernetes Job. It’s insteresting tho. But “my way” works fine.

That’s something to care about. But I guess GKE is more friendly for this.

My proposal is to use Liveloader locally Pointing to a Cluster. But this is not the main proposal here. What matters here is to share the distributed load information. So all possibilities are valid. In the “Dgraph-Bulk-Script” repository I already predict the Live or Bulk usage within the Cluster context (GKE), but using kubectl copy command and “sleep” (sh). Not elegant but functional.

Not sure, as I said in the last topic I marked you. LiveLoad gets stuck with an Alpha only. Maybe there is another solution, but all indicates that only exposing the gRPC of the Alphas works.

I’ll check the other links. See if I can dig goodies.


(Daniel Mai) #4

The “gRPC Load Balancing on Kubernetes Without Tears” (:sob:) blog post sounds exactly like the issue we’re trying to solve.


(Daniel Mai) #5

Hooking up Linkerd2 to a Dgraph cluster was fairly painless, and clients connecting via the single ClusterIP or LoadBalancer endpoint would end up talking to different Alphas based on a exponentially weighted moving average (https://linkerd.io/2/features/load-balancing/).

The client needed to talk to Linkerd too. e.g., After setting up a Dgraph cluster in K8s with Linkerd (I followed their Getting Started guide), I ran a dgraph pod for the dgraph increment tool and connected it to linkerd via linkerd inject:

kubectl run dgraph-increment --image=dgraph/dgraph:v1.0.14 --restart=Never --dry-run -o yaml --command -- sleep 3600 | linkerd inject - | k apply -f -

Then I could run the increment tool against the K8s LoadBalancer named dgraph-alpha-public:

dgraph increment --addr=dgraph-alpha-public:9080 --num=1000

and—with Alpha query logs enabled—I’d see different Alphas respond back to queries (beginning of each log line):

dgraph-alpha-0 alpha I0416 17:22:23.477168       1 server.go:446] Got a query: query:"{ q(func: has(counter.val)) { uid, val: counter.val }}"
dgraph-alpha-1 alpha I0416 17:22:23.571169       1 server.go:446] Got a query: query:"{ q(func: has(counter.val)) { uid, val: counter.val }}"
dgraph-alpha-1 alpha I0416 17:22:23.668939       1 server.go:446] Got a query: query:"{ q(func: has(counter.val)) { uid, val: counter.val }}"
dgraph-alpha-1 alpha I0416 17:22:23.761325       1 server.go:446] Got a query: query:"{ q(func: has(counter.val)) { uid, val: counter.val }}"
dgraph-alpha-0 alpha I0416 17:22:23.859834       1 server.go:446] Got a query: query:"{ q(func: has(counter.val)) { uid, val: counter.val }}"
dgraph-alpha-1 alpha I0416 17:22:23.949468       1 server.go:446] Got a query: query:"{ q(func: has(counter.val)) { uid, val: counter.val }}"
dgraph-alpha-1 alpha I0416 17:22:24.035094       1 server.go:446] Got a query: query:"{ q(func: has(counter.val)) { uid, val: counter.val }}"
dgraph-alpha-0 alpha I0416 17:22:24.196147       1 server.go:446] Got a query: query:"{ q(func: has(counter.val)) { uid, val: counter.val }}"
dgraph-alpha-2 alpha I0416 17:22:24.289699       1 server.go:446] Got a query: query:"{ q(func: has(counter.val)) { uid, val: counter.val }}"
dgraph-alpha-2 alpha I0416 17:22:24.384236       1 server.go:446] Got a query: query:"{ q(func: has(counter.val)) { uid, val: counter.val }}"
dgraph-alpha-2 alpha I0416 17:22:24.456025       1 server.go:446] Got a query: query:"{ q(func: has(counter.val)) { uid, val: counter.val }}"

This also works with the headless ClusterIP Service address that’s not publically exposed (dgraph-alpha), after configuring it to expose port 9080 for clients:

dgraph increment --addr=dgraph-alpha:9080 --num=1000

There’s this open issue with Linkerd about not being able to connect to a particular StatefulSet pod via their stable DNS name: https://github.com/linkerd/linkerd2/issues/2266. I went on the Linkerd Slack channel to ask about its progress and it’s not fixed yet , and Oliver Gould said “it’s probably best to avoid injecting client that talk to stable network ids”. So for now when proxying Dgraph through Linkerd the internal ports 5080 and 7080 cannot go through Linkerd so that e.g. dgraph-alpha-0.dgraph-alpha:5080 can talk to dgraph-alpha-1.dgraph-alpha:5080 and vice versa, and the same for the Zero pods.

 cat dgraph-ha.yaml | linkerd inject --skip-inbound-ports 5080,7080 --skip-outbound-ports 5080,7080 - | k apply -f -

(Michel Conrado) #6

up +

PS. After the publication of this topic. We gonna work in a deep and technical blog post for this.