Hi all, following up with progress from the previous thread Dgraph can't idle without being oomkilled after large data ingestion
We’ve been getting better performance with the changes being pushed to master, which has been great (and also with the WIP changes in https://github.com/dgraph-io/dgraph/pull/5535). Also want to note, L0OnMemory=false makes a major difference for us, without it the cluster falls over from oom almost immediately. Thoughts on exposing that variable as a runtime config?
However we’re running into a situation now where if we kick up our ingestion slightly above what dgraph likes to handle, we get stuck mutations
We cut all load on the cluster, but those mutations never clear on alpha-0, and we can’t ingest further. Here’s some logs after I tried restarting the node to fix it. Through this time mem/cpu were only at 25% util. What should I try next?
logout_restart_rightimg (295.4 KB)
This image is built on current master with the L0 flag set to disk.