How to prevent RAM usage of Alpha node from growing?

Hey @peter-hartmann-emrsn, we had a long thread going over here with similar memory related issues with high-throughput ingestion. Dgraph can't idle without being oomkilled after large data ingestion. Have you been able to take a heap snapshot of the memory? In that thread @JimWen found the root cause was actually etcd not reading large messages in chunks, which he outlines in this post → Dgraph can't idle without being oomkilled after large data ingestion - #60 by JimWen

Additionally, as part of that thread we learned that there’s a setting in dgraph for keeping the L0 cache in memory (default) vs disk. I don’t believe that setting has been exposed as a flag- but if you compile dgraph it’s just a bool that can be flipped. For us, the system would OOM very quickly without L0 being set to disk when under heavy ingestion workloads.

As a quick aside @Paras, this asynchronous ingestion pipeline Peter describes is similar to ours (and what I was alluding to in our discussion of the multi-tenancy RFC :slight_smile: ). In these cases there’s utility if one ingestor can write to many tenants.

Not to derail too much, but as a general critique, the upsert flow is a little painful (we had to implement our own cache for the uids like Peter did), but as I understand it, those incrementing uids as provisioned by the oracle are integral design of the system so I’m not sure on ways to make it more ergonomic.

1 Like