Indefinite memory growth as BadgerDB grows, unaccounted for by golang pprof

I’m trying to debug a production application that’s using BadgerDB for it’s database. My program recently crashed with an OOM signal after a month or so. What I’ve noticed is that the average memory usage in my program kept increasing slowly over the course of this month until a usage spike (This is normal for my use case, it spikes every day or so) happened and it could no longer manage. I’ve spent the last couple of days narrowing it down but I could not explain what was happening with Badger. Any help would be greatly appreciated.

I’ve noticed that as the database grows the memory usage grows,

  • Is this normal?
  • Why is it happening?
  • How do I set a limit to it?
  • Why isn’t it shown by pprof?

What version of Go are you using (go version)?

$ go version
go version go1.17.2 darwin/amd64

What operating system are you using?

macOS Big Sur or CentOS 7 (2009)

What version of Badger are you using?

v3.2103.2

Does this issue reproduce with the latest master?

Yes.

Steps to Reproduce the issue

func main() {
	db, err := badger.Open(badger.DefaultOptions("store"))
	if err != nil {
		panic(err)
	}
	defer db.Close()

	for i := 0; i < 1e8; i++ {
		a := strconv.Itoa(i)
		if err := db.Update(func(txn *badger.Txn) error {
			return txn.Set([]byte(a), []byte(a))
		}); err != nil {
			panic(err)
		}
	}
}

What Badger options were set?

badger.DefaultOptions()
Also tried disabling cache, bloom, messing with all variables that seemed like they had something to do with memory usage but nothing changes this behavior.

What did you do?

Inserted 100 million records into Badger.
htop shows used memory growing indefinitely as the program is adding records, but none of this shows up on pprof

What did you expect to see?

Somewhat constant memory usage that’s affected by the database options.

What did you see instead?

Indefinitely growing memory usage that doesn’t ever go down no matter what settings I use for the database.

Also a massive difference between pprof memory usage and actual memory usage. Any idea what’s causing this?

Showing nodes accounting for 309.71MB, 98.79% of 313.50MB total
Dropped 20 nodes (cum <= 1.57MB)
Showing top 10 nodes out of 34
      flat  flat%   sum%        cum   cum%
     224MB 71.45% 71.45%      224MB 71.45%  github.com/dgraph-io/ristretto/z.Calloc (inline)
   83.20MB 26.54% 97.99%    83.20MB 26.54%  github.com/dgraph-io/badger/v3/skl.newArena (inline)
    2.50MB   0.8% 98.79%        3MB  0.96%  runtime.allocm
         0     0% 98.79%    83.20MB 26.54%  github.com/dgraph-io/badger/v3.(*DB).doWrites.func1
         0     0% 98.79%    83.20MB 26.54%  github.com/dgraph-io/badger/v3.(*DB).ensureRoomForWrite
         0     0% 98.79%    83.20MB 26.54%  github.com/dgraph-io/badger/v3.(*DB).newMemTable
         0     0% 98.79%    83.20MB 26.54%  github.com/dgraph-io/badger/v3.(*DB).openMemTable
         0     0% 98.79%    83.20MB 26.54%  github.com/dgraph-io/badger/v3.(*DB).writeRequests
         0     0% 98.79%       96MB 30.62%  github.com/dgraph-io/badger/v3.(*levelsController).compactBuildTables.func3
         0     0% 98.79%       96MB 30.62%  github.com/dgraph-io/badger/v3.(*levelsController).subcompact

Screen Shot 1443-03-28 at 4.40.40 PM

Hey @hasel! Badger uses manually allocated memory (see Manual Memory Management in Go using jemalloc - Dgraph Blog) and that’s why you’re seeing memory spike but it doesn’t show up in go pprof.

If you’re comfortable with editing go code, you should try to print the contents of this map https://github.com/dgraph-io/ristretto/blob/efb105d0ca5ed9ceec285b838c0bf7fabf8d3bf2/z/calloc_jemalloc.go#L49 and figure out what’s eating up all the memory

Thank you for your response Ibrahim. I wasn’t using jemalloc before. After trying it I notice a bit less memory usage but the problem is still there.
Here’s a sample of the largest dallocs print I’ve come across

TableBuilder 16 MiB
TableBuilder 16 MiB
TableBuilder 16 MiB
Builder 32 MiB
Builder 16 MiB
Builder 16 MiB
TableBuilder 8.0 MiB
TableBuilder 8.0 MiB
TableBuilder 16 MiB
Builder 16 MiB
TableBuilder 16 MiB
TableBuilder 4.0 MiB
Builder 8.0 MiB

pprof output at the same time:

Showing nodes accounting for 86.48MB, 100% of 86.48MB total

Seems perfectly fine except that htop is telling me that the process is using just under 1GB of ram. Even when the contents of dallocs is less than 32MiB.

What’s even stranger is that a significant portion of this allocated memory stays reserved even after the database is closed.
Any ideas?

EDIT:
Here’s a heap profile after the database was closed.

(pprof) top
Showing nodes accounting for 86.98MB, 100% of 86.98MB total
Showing top 10 nodes out of 34
      flat  flat%   sum%        cum   cum%
   83.20MB 95.66% 95.66%    83.20MB 95.66%  github.com/dgraph-io/badger/v3/skl.newArena (inline)
    1.27MB  1.46% 97.12%     1.27MB  1.46%  github.com/dgraph-io/ristretto.newCmRow
       1MB  1.15% 98.28%        1MB  1.15%  runtime.allocm
    0.50MB  0.58% 98.85%     0.50MB  0.58%  runtime.malg
    0.50MB  0.57% 99.43%     0.50MB  0.57%  runtime.gcBgMarkWorker
    0.50MB  0.57%   100%     0.50MB  0.57%  main.main.func2
         0     0%   100%    83.20MB 95.66%  github.com/dgraph-io/badger/v3.(*DB).doWrites.func1
         0     0%   100%    83.20MB 95.66%  github.com/dgraph-io/badger/v3.(*DB).ensureRoomForWrite
         0     0%   100%    83.20MB 95.66%  github.com/dgraph-io/badger/v3.(*DB).newMemTable
         0     0%   100%    83.20MB 95.66%  github.com/dgraph-io/badger/v3.(*DB).openMemTable