[K8s / Devops] Need to investigate OOMs of pods with setup done using Helm charts

diggy · December 9, 2019, 2:38pm

Moved from GitHub dgraph/4384

What version of Dgraph are you using?

master and v1.1.0

What is the hardware spec (RAM, OS)?

3 node GKE cluster
45 GB memory

Steps to reproduce the issue (command/config used to run Dgraph).

Go to contrib/config/kubernetes/helm.

cd contrib/config/kubernetes/helm
helm install my-release ./ --set alpha.service.type="LoadBalancer"

Use JS Flock and point the endpoint to the Dgraph alpha load balancer IP.

Expected behaviour and actual result.

OOMs of pods after a while of running the JS Flock app, multiple times.
There doesn’t seem to be any limit set on the CPU or memory. It is worth looking into what is causing it.

diggy · December 10, 2019, 7:11am

hackintoshrao commented :

@prashant-shahi : Can you add the monitoring metrics too?

diggy · December 10, 2019, 7:57am

prashant-shahi commented :

@hackintoshrao Sure.

Here are some of the metrics from GKE dashboard.

I had also periodically collected memory profiles for the same. Do ping me if anyone needs it.

joaquin · August 14, 2020, 11:44pm

For the helm charts, you definitely want to put some limits on the pods, or else it can consume enough memory and disrupt services like kubelet and taint the node so that other pods cannot be scheduled on it.

I also wonder about the behavior using NEG instead of LoadBalancer service type. This can be configured in service annotations.

Topic		Replies	Views
Alpha data skew Dgraph kind:question	14	488	April 30, 2021
Dgraph Alpha Eating Up All RAM Dgraph	7	596	September 9, 2021
Issues with Dgraph running in Kubernetes (K8 Loadbalancing?) Dgraph kind:bug	6	1124	October 7, 2020
Dgraph storage doubled in less than 24 hours Dgraph	2	332	March 18, 2021
Resource limits cause cluster oom kill lock Issues untagged , charts	1	810	June 4, 2020

[K8s / Devops] Need to investigate OOMs of pods with setup done using Helm charts

What version of Dgraph are you using?

What is the hardware spec (RAM, OS)?

Steps to reproduce the issue (command/config used to run Dgraph).

Expected behaviour and actual result.

Related topics