Is it possible to set a maximum amount of RAM badger will use?

I am working on an application for storing incoming data into BadgerDB. As more unique data is added the amount of RAM being used increases steadily for what appears to be forever. Is it possible to cap the RAM size that badger will use (even it this results in worse performance due to increase use of files)?

For instance I have been running my application for the last 24 hours and have 200GB of vlog files and 75GB of sst files. The RAM usage is around 22GB. I want to be able to run this application on a machine with only 16GB of RAM in total (but almost unlimited disk space). Is it possible to configure badger to work in this situation?

Hey @Peter,
My guess is ZSTD Decompression and bloom filters (which are stored in the ram) are taking up most of the space.

You can try setting the following options

  1. Disable compression - Set options.Compression = options.None. This means we won’t allocate memory for decompression (this can be a lot in case of ZSTD decompression)
  2. Use FileIO mode - Set options.TableLoadingMode and options.ValueLogLoadingMode to FileIO. The default loading mode is memory-map which means data will be loaded into RAM when it is accessed. Using FileIO mode could severely affect your read speed but it should reduce the RAM usage.
  3. Disable the cache - Set options.MaxCacheSize=0. This option is available only in the master branch of badger. Disabling the cache should also reduce memory usage by 1 GB. If you’re not using compression or encryption, using the cache doesn’t have any significant benefits.

However, I’d like to figure out what is eating up all your RAM. Can you collect a heap profile and send it? It should show what is taking up most of the ram.
Here’s how you can profile your program https://golang.org/pkg/net/http/pprof/

  • Add import _ "net/http/pprof"
  • Start a web server.
go func() {
	log.Println(http.ListenAndServe("localhost:6060", nil))
}()
  • Do a curl on localhost:port/debug/pprof/heap . Please collect the heap profile after all your tables are loaded. It would also help if you can collect multiple profiles at different times. That should help us figure out what is using the RAM.

Hi @ibrahim,

Thank you for the detailed reply. I spent the weekend running more tests with your configuration options mentioned above. While I was able to reduce the initial memory usage, I still experienced a gradual increase of memory usage over time.

I took some pprof dumps during the course of the test and the main memory hog was new BloomFilters being created. From a quick look at the code a new BloomFilter is needed every time a new .sst file is created. These BloomFilters have to stay resident in memory, so the more unique data I insert into the system, the more memory it will consume. My problem is that I want to be able to store unique data for many months at a time without the memory usage growing over that time. Am I right to say this is not possible if I use BadgerDB?

I have collected the heap dumps but am unable to upload them as a new user.
(Here’s a link to them on my GDrive account: https://drive.google.com/open?id=1n1Etd0T-8FdZlHJNnToMMaoa7mjiW2Pr)

Pete

I took some pprof dumps during the course of the test and the main memory hog was new BloomFilters being created. From a quick look at the code a new BloomFilter is needed every time a new .sst file is created. These BloomFilters have to stay resident in memory, so the more unique data I insert into the system, the more memory it will consume. My problem is that I want to be able to store unique data for many months at a time without the memory usage growing over that time. Am I right to say this is not possible if I use BadgerDB?

We recently added support for caching the bloomfilters in the cache https://github.com/dgraph-io/badger/commit/4676ca96f15b9e9a3b1bc163b1a2a908bde553f0 .
With the above patch, the bloomfilters won’t be kept in the memory. You can set the max size of the cache and then the same cache will be used for storing blocks and bloomfilters in memory.

It would be very useful if you can run your tests against the new fix and let me know how it goes :slight_smile: