Resource limits cause cluster oom kill lock

Posted by seanlaff:

I spin up a cluster with req/limit of 6gb mem. After about 10 mins of heavy load, dgraph alphas get oomkilled by kubernetes. When the alpha pods restart, the get oomkilled straight away- and the whole cluster stays in a broken state.

I’m guessing there’s some sort of write-ahead-log that dgraph is trying to resume from (from the attached persistent volumes) that is larger than the mem limit given- causing it to instantly get oom killed?

seanlaff commented :

Discussion was continued here Dgraph can't idle without being oomkilled after large data ingestion

Improvements have been made to both badger and dgraph since. Will run another large scale test soon