How to prevent RAM usage of Alpha node from growing?

seanlaff · July 1, 2020, 1:52pm

Hey @peter-hartmann-emrsn, we had a long thread going over here with similar memory related issues with high-throughput ingestion. Dgraph can't idle without being oomkilled after large data ingestion. Have you been able to take a heap snapshot of the memory? In that thread @JimWen found the root cause was actually etcd not reading large messages in chunks, which he outlines in this post → Dgraph can't idle without being oomkilled after large data ingestion - #60 by JimWen

Additionally, as part of that thread we learned that there’s a setting in dgraph for keeping the L0 cache in memory (default) vs disk. I don’t believe that setting has been exposed as a flag- but if you compile dgraph it’s just a bool that can be flipped. For us, the system would OOM very quickly without L0 being set to disk when under heavy ingestion workloads.

peter-hartmann-emrsn:

sending nQuads that get chunked up to sets of 1000 nQuads during each mutation, op to 10 mutations run in parallel. The mutations are send using this code:
    var mutation = new MutationBuilder { SetNquads = nQuads };
    var req = new RequestBuilder { Query = query, CommitNow = true }.WithMutations(mutation);
    var r1 = await txn.Mutate(req);
Upserts queries are built for unknown uids. Uids returned by the mutation are cached and used to build new mutations.
The event records are live events that need to be written to Dgraph as they occure/arrive so they can be consumed by Dgraph queries with little delay. Bulk-loader or live-loader seem not right for this. Just for testing perhaps I could write all records to an rdf file and see if live-loader import ends up with similar memory increase.

As a quick aside @Paras, this asynchronous ingestion pipeline Peter describes is similar to ours (and what I was alluding to in our discussion of the multi-tenancy RFC ). In these cases there’s utility if one ingestor can write to many tenants.

Not to derail too much, but as a general critique, the upsert flow is a little painful (we had to implement our own cache for the uids like Peter did), but as I understand it, those incrementing uids as provisioned by the oracle are integral design of the system so I’m not sure on ways to make it more ergonomic.

Topic		Replies	Views
Preventing OOM on alpha when doing large queries Dgraph	11	577	July 21, 2020
When writing data, dgraph takes up too much memory Dgraph area:performance	7	715	January 20, 2021
Dgraph Alpha Eating Up All RAM Dgraph	7	512	September 9, 2021
Ever Increasing Memory on Dgraph Alpha Dgraph Cloud dgraph	4	720	December 4, 2021
Consistent Increase in memory usage for zero leader Dgraph area:performance	7	1304	October 13, 2020

How to prevent RAM usage of Alpha node from growing?

Related Topics