Used the default options to populate a table with about 1000 key/val pairs where each value is roughly 30MB.
The badger database directory is 101GB according to du. There are 84 .vlog files.
When I start my server up, it quickly consumes 10 GB of ram and dies due to OOM. dmesg output:
[654397.093709] Out of memory: Killed process 15281 (taskserver) total-vm:20565228kB, anon-rss:12610116kB, file-rss:0kB, shmem-rss:0kB
What did you expect to see?
I would expect the database to provide a simple option to limit memory usage to an approximate cap.
What did you see instead?
The recommended mechanism of tweaking a many-dimension parameter space is confusing and hasn’t worked for me.
The memory related parameters are not explained in much detail. For example, the docstring for options.MemoryMap doen’t indicate roughly how expensive MemoryMap is vs FileIO.
I haven’t managed to successfully reduce memory usage using the following parameters:
Agree. We use it for a simple key-value lookup with a couple of billion records (database directory is 700 GB).
It uses about 200 GB of RAM which is unacceptable. The culprit are memory mapped files according to Process Explorer.
Good thing we have a lot of RAM, but there should be an easy well-defined max memory limit to set.
In this case, backup.go’s Load function. seems to be a major offender. It does not account for the size of the values at all. Added logging shows huge key/value accumulation and no flushing:
I0326 09:16:43.884962 5195 taskstorage.go:147] not flushing with 1269 entries, 73.2K key size, 3.2G combined size, 9.6M limit
I’m guessing there are many places where value size is not accounted for when making memory management decisions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Although I have a fix for the backup restoration issue, this issue as a whole has not been addressed.
I’m not aware of what causes badger to take up the amount of memory that it does. That seems like the first step towards introducing a flag for setting a fixed memory limit. May someone from the badger team weigh in?
I’m not aware of what causes badger to take up the amount of memory that it does. That seems like the first step towards introducing a flag for setting a fixed memory limit. May someone from the badger team weigh in?
The amount of memory being used depends on your DB options. For instance, each table has a bloom filter and these bloom filters are kept in memory. Each bloomfilter takes up 5 MB of memory. So if you have 100 GB of data, that means you have (1001000/64) = 1562 tables, and 15625 MB is about 7.8 GB of memory. So your bloom filters alone would take up 7.8 GB of memory. We have a separate cache in badger v2 to reduce the memory used by bloom filters.
Other things that might affect memory usage is the table loading mode. If you set the table loading mode to fileIO, the memory usage should reduce but then your reads would be very slow.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Perhaps something else to keep in mind when tracking down memory hog issues: The Go memory profile doesn’t seem to capture the full extent of memory usage.
Here is a screenshot that shows the system’s accounting (12.7 GB) vs Go’s accounting (84.34 MB).
On the other hand, sometimes memory usage is quite high & there is a lot of allocing.
I can’t find a way to force the OS to reclaim the memory freed by Go, which seems to use MADV_FREE on recent linux version (- The Go Programming Language). It would be helpful to force the OS to reclaim such memory get a more accurate picture of what’s going on.
In my case, it would help if prefetchValues had an option to restrict prefetches based on value byte size, not number of values. Perhaps the IteratorOptions could become
// IteratorOptions is used to set options when iterating over Badger key-value
// stores.
//
// This package provides DefaultIteratorOptions which contains options that
// should work for most applications. Consider using that as a starting point
// before customizing it for your own needs.
type IteratorOptions struct {
// Indicates whether we should prefetch values during iteration and store them.
PrefetchValues bool
// How many KV pairs to prefetch while iterating. Valid only if PrefetchValues is true.
PrefetchSize int
// If non-zero, specifies the maximum number of bytes to prefetch while
// prefetching iterator values. This will overrule the PrefetchSize option
// if the values fetched exceed the configured value.
PrefetchBytesSize int
Reverse bool // Direction of iteration. False is forward, true is backward.
AllVersions bool // Fetch all valid versions of the same key.
// The following option is used to narrow down the SSTables that iterator picks up. If
// Prefix is specified, only tables which could have this prefix are picked based on their range
// of keys.
Prefix []byte // Only iterate over this given prefix.
prefixIsKey bool // If set, use the prefix for bloom filter lookup.
InternalAccess bool // Used to allow internal access to badger keys.
}
Even better would be a database-wide object for restricting memory use to a strict cap.
@gonzojive How big are your values? The memory profile you shared shows that y.Slice was holding 15 GB of data. That’s unusual unless you have a big value.
I can’t find a way to force the OS to reclaim the memory freed by Go, which seems to use MADV_FREE on recent linux version (- The Go Programming Language). It would be helpful to force the OS to reclaim such memory get a more accurate picture of what’s going on.
// HeapIdle minus HeapReleased estimates the amount of memory
// that could be returned to the OS, but is being retained by
// the runtime so it can grow the heap without requesting more
// memory from the OS. If this difference is significantly
// larger than the heap size, it indicates there was a recent
// transient spike in live heap size.
HeapIdle uint64
So heapIdle - heapreleased in your case is
>>> (9454788608-3498221568) >> 20
5680
which is 5.6 GB. That’s the amount of memory golang runtime is holding.
In this case, the values are 25 MB or more. The memory usage was from prefetching 100 values for each request, and many requests are run in parallel. Limiting prefetching fixed the specific issue I was having, but the general feature request remains open.
The inability to set a hard memory limit remains my #1 issue with Badger. I’m using badger as an embedded db for little home automation IoT app, and my service tends to crashloop after the database grows too large for the Raspberry Pi:
Some sort of memory visualizer might help users set appropriate memory limits if a simple option is not planned. I can’t tell where all the memory is being taken up very easily, so I don’t know which options to set.