[disaster recovery] Cluster unable to recover after crash during intensive writing operations

seanlaff commented :

Hey @christian-roggia I’ve been running into a problem with similar symptoms as outlined here Active Mutations stuck forever

Do you see your mutations stuck like this?
Screen Shot 2020-06-11 at 10 20 38 AM

If I overload a node with writes long enough, I will get into this state, and be unable to recover even if I cut all load. I’m not running on preemptible nodes but I am running with the official helm chart on GKE.