I’ve upserted a ~100 million nodes into dgraph (each node is small… a few fields each but they all have one relationship back to the same “root” node… which is how Im trying to achieve namespacing in the cluster). They also each have an
I have 3 zeros and 3 alphas running on GKE with the v20.03.0 release. The alphas each have their own n1-highcpu-16 node, with the alpha lru cache set to 2048 (via the dgraph helm chart)
With this setup, I could consistently write at around 10k upserts per second- however once I reach a certain scale, a dgraph node gets oom killed by kube.
Once that node gets shut down- the cluster never recovers. The node gets oom killed repeatedly despite the cluster being idle/ingestion being paused.
I’ve attached the logs for the 3 zeros and alphas- as well as another set of logs from alpha1 (the bad node) after it restarted and oom killed again.
It feels like the bad node is trying to read files from disk that are too large to fit in memory. What can I do to mitigate the situation/is this expected behavior?