Badger Allocates A Lot Of Memory When Iterating Over Large Key Value Stores

Moved from GitHub badger/1326

Posted by bonedaddy:

What version of Go are you using (go version)?

go version go1.14.2 linux/amd64

What operating system are you using?

NAME=“Ubuntu”
VERSION=“18.04.4 LTS (Bionic Beaver)”
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME=“Ubuntu 18.04.4 LTS”
VERSION_ID=“18.04”
HOME_URL=“https://www.ubuntu.com/
SUPPORT_URL=“https://help.ubuntu.com/
BUG_REPORT_URL=“https://bugs.launchpad.net/ubuntu/
PRIVACY_POLICY_URL=“https://www.ubuntu.com/legal/terms-and-policies/privacy-policy
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

What version of Badger are you using?

v2.0.3

Does this issue reproduce with the latest master?

Haven’t tried

Steps to Reproduce the issue

  1. Store a ton of data in your key-value store (in this case 1.7TB)
  2. Restart badger
  3. After service startup iterate over all keys in the key-store

What Badger options were set?

Default options with the following modifications:

	DefaultOptions = Options{
		GcDiscardRatio: 0.2,
		GcInterval:     15 * time.Minute,
		GcSleep:        10 * time.Second,
		Options:        badger.DefaultOptions(""),
	}
	DefaultOptions.Options.CompactL0OnClose = false
	DefaultOptions.Options.Truncate = true

I’ve also set the following:

  • ValueLogLoadingMode = FileIO
  • TableLoadingMode = FileIO
  • SyncWrites = false

What did you do?

At the start of my service, the key-value store will be iterated over to announce to peers data in the key-value store. Unfortunately however when storing a large amount of data in that key-value store (1.7TB), iterative over the kv allocates a large amount of memory.

What did you expect to see?

Being able to iterate over the keys without allocating a large amount of memory

What did you see instead?

2GB+ of allocations when iterating over all the keys in a large datastore of 1.7TB

Additional Information

I recorded the following profile which shows what’s responsible for the memory allocations:

 2239.12MB 57.90% 57.90%  2239.12MB 57.90%  github.com/RTradeLtd/go-datastores/badger.(*txn).query
  687.09MB 17.77% 75.66%   687.09MB 17.77%  github.com/dgraph-io/badger/v2/table.(*Table).read
  513.05MB 13.27% 88.93%  1139.44MB 29.46%  github.com/RTradeLtd/go-datastores/badger.(*txn).query.func1
   83.20MB  2.15% 91.08%    83.20MB  2.15%  github.com/dgraph-io/badger/v2/skl.newArena
   69.16MB  1.79% 92.87%   109.17MB  2.82%  github.com/dgraph-io/badger/v2/pb.(*TableIndex).Unmarshal
      40MB  1.03% 93.90%       40MB  1.03%  github.com/dgraph-io/badger/v2/pb.(*BlockOffset).Unmarshal

It looks like this is because I have a function that is iterative over all the keys in the key-value store to broadcast the keys to another peer. I’m not sure why this would result in a massive amount of memory being allocated though.

This seems somewhat related to other reported issues such as Provide simple option for limiting total memory usage · Issue #1268 · dgraph-io/badger · GitHub. The usage of FileIO for table and value log loading mode seems to decrease memory usage abit, however it seems like the overall process of reading keys and/org value from badger requires a lot of memory

jarifibrahim commented :

@bonedaddy The high memory usage you’re seeing comes from badger/table.go at 9459a240bdf3cf1bf59a172558b37455c742d3bf · dgraph-io/badger · GitHub
I would’ve expected golang GC to take care of the allocations/reclaimation but maybe since we’re allocating so many []byte slices, the GC doesn’t run that often. I think we can optimize this by having one block buffer per table. That way it will be reused across multiple t.read calls.

Do you see high memory usage when you open file in memory map mode? The operating system should take care of moving memory-mapped pages from your memory to your disk when the memory usage is high.

bonedaddy commented :

ah makes sense. A reusable buffer would definitely reduce memory a ton.

Do you see high memory usage when you open file in memory map mode? The operating system should take care of moving memory-mapped pages from your memory to your disk when the memory usage is high.

I tried setting table loading to memory map and kept the value loading to fileio and that increased memory consumption by a lot more. Previously my service was consuming 2.4GB of memory with both value and table loading set to fileio after iterating over all the keys, however when using memory map for table loading, and fileio for value loading memory jumped to 3.5GB

jarifibrahim commented :

@bonedaddy You should also set the KeepL0InMemory option to false. That would reduce the memory consumption by around 600 mb

bonedaddy commented :

@bonedaddy You should also set the KeepL0InMemory option to false. That would reduce the memory consumption by around 600 mb

Yep even with that, using mmap for TableLoading consumed the 3.5GB of memory, and both table loading + value loading at fileIO with L0InMemory to false, consumed the 2.4GB

jarifibrahim commented :

@bonedaddy can you get a memory profile when the memory usage is high?

bonedaddy commented :

@bonedaddy can you get a memory profile when the memory usage is high?

I can capture another one, but I included one when i opened up the issue, ill work on getting another profile though:

 2239.12MB 57.90% 57.90%  2239.12MB 57.90%  github.com/RTradeLtd/go-datastores/badger.(*txn).query
  687.09MB 17.77% 75.66%   687.09MB 17.77%  github.com/dgraph-io/badger/v2/table.(*Table).read
  513.05MB 13.27% 88.93%  1139.44MB 29.46%  github.com/RTradeLtd/go-datastores/badger.(*txn).query.func1
   83.20MB  2.15% 91.08%    83.20MB  2.15%  github.com/dgraph-io/badger/v2/skl.newArena
   69.16MB  1.79% 92.87%   109.17MB  2.82%  github.com/dgraph-io/badger/v2/pb.(*TableIndex).Unmarshal
      40MB  1.03% 93.90%       40MB  1.03%  github.com/dgraph-io/badger/v2/pb.(*BlockOffset).Unmarshal

bonedaddy commented :

Here’s a capture from badger operating with table and file loading modes at FileIO under high usage. However this isn’t related to the high memory usage reported by the query function, and is instead from me putting a lot of data into badger to capture a profile of the high memory usage when iterating over the keys:

bonedaddy commented :

Some more profiles

jarifibrahim commented :

Thanks @bonedaddy. I’ll look it up and get back.

bonedaddy commented :

Thanks, let me know if I need to capture anymore profiles

jarifibrahim commented :

@bonedaddy How big are your values? Also, can you send me the memprofile file? The graph doesn’t show the exact line which is consuming the memory.