Resource limits cause cluster oom kill lock

Moved from GitHub charts/17

Posted by seanlaff:

I spin up a cluster with req/limit of 6gb mem. After about 10 mins of heavy load, dgraph alphas get oomkilled by kubernetes. When the alpha pods restart, the get oomkilled straight away- and the whole cluster stays in a broken state.

I’m guessing there’s some sort of write-ahead-log that dgraph is trying to resume from (from the attached persistent volumes) that is larger than the mem limit given- causing it to instantly get oom killed?

seanlaff commented :

Discussion was continued here Dgraph can't idle without being oomkilled after large data ingestion - #60 by JimWen

Improvements have been made to both badger and dgraph since. Will run another large scale test soon