Resource limits cause cluster oom kill lock

diggy · April 29, 2020, 12:22pm

Moved from GitHub charts/17

Posted by seanlaff:

I spin up a cluster with req/limit of 6gb mem. After about 10 mins of heavy load, dgraph alphas get oomkilled by kubernetes. When the alpha pods restart, the get oomkilled straight away- and the whole cluster stays in a broken state.

I’m guessing there’s some sort of write-ahead-log that dgraph is trying to resume from (from the attached persistent volumes) that is larger than the mem limit given- causing it to instantly get oom killed?

diggy · June 4, 2020, 12:42pm

seanlaff commented :

Discussion was continued here Dgraph can't idle without being oomkilled after large data ingestion - #60 by JimWen

Improvements have been made to both badger and dgraph since. Will run another large scale test soon

Topic		Replies	Views
[K8s / Devops] Need to investigate OOMs of pods with setup done using Helm charts Dgraph dgraph , status:accepted , area:operations	3	629	August 14, 2020
Dgraph Alpha Eating Up All RAM Dgraph	7	610	September 9, 2021
Fatal error when live loading data Dgraph	7	1910	December 20, 2018
Dgraph can't idle without being oomkilled after large data ingestion Dgraph	63	3861	September 14, 2020
Preventing OOM on alpha when doing large queries Dgraph	11	624	July 21, 2020

Resource limits cause cluster oom kill lock

Related topics