Bulk loader uses too much memory

Moved from GitHub dgraph/3017

Posted by codexnull:

The bulk loader’s memory usage can grow beyond the size of physical RAM and cause swap space to be used. As the image below shows, memory usage grows monotonically during the map phase, is flat during the reduce phase, and is all finally released at exit.

The system does not appear to thrash, though, which suggests the memory is just being held by the process without it actively being accessed. Still, a big enough load may cause it to grow greater than RAM + swap and cause system instability or crashes.

Note the system was idle other than for dgraph. I did not test what would happen if concurrent processes had significant memory demands as well.

shekarm commented :


Thank you for the post. We have made some improvements in our software (recently) which will likely exhibit different behavior – in other words, much less memory usage. Keep you posted on developments. Meanwhile, if you haven’t already done so, would you be able to share the characteristics of your data set that you are using?


ashish-goswami commented :

I think this issue can be closed after we fix Inconsistent bulk loader failures · Issue #5361 · dgraph-io/dgraph · GitHub