When creating indicies that result in large amounts of storage for the index, memory growth grows linearly and is nearly unmanageable on 64GiB systems. I have not looked into the code yet on why it follows this pattern, but I wanted to put this out there in case dgraph devs had any insights on why.
The amount of data on disk is not actually a crazy amount - here are some log lines from the beginning of the index rebuild work:
Rebuilding index for attr XXXX.name and tokenizers [trigram exact] Rebuilding index for predicate 0-XXXX.name (1/2): Streaming about 12 GiB of uncompressed data (3.7 GiB on disk)
so you can see 12GiB uncompressed. However, the resulting index is roughly 170GiB (according to the logs). It is taking us over 32GiB of ram at the moment to process this single operation. I was wondering if we could do this in a way that streams the result to disk instead of using all this memory - I really assumed that it did that already.
See here the memory monitoring for that group vs other groups:
Note the big spike at the end - that is after the index is completely built - it is finalizing the index somehow and them boom, it OOMkills (that is the drop). These are 32GiB machines (and you see the other nodes use 3-4GiB normally), and I need to switch to 64GiB machines just to build the trigram index for this one predicate.
Can we get a pattern here that is constant memory usage? This linear pattern will not fly after it starts taking 64GiB just to do that.
Dgraph version : v21.03.2 Dgraph codename : rocket-2 Dgraph SHA-256 : 00a53ef6d874e376d5a53740341be9b822ef1721a4980e6e2fcb60986b3abfbf Commit SHA-1 : b17395d33 Commit timestamp : 2021-08-26 01:11:38 -0700 Branch : HEAD Go version : go1.16.2 jemalloc enabled : true