Memtable memory usage causes OOM

What version of Go are you using (go version)?

$ go version
go version go1.13.12 linux/amd64

What operating system are you using?

Linux username 4.19.0-9-cloud-amd64 #1 SMP Debian 4.19.118-2+deb10u1 (2020-06-07) x86_64 GNU/Linux
Ubuntu (32 GB RAM, 1.5TB SDD)

What version of Badger are you using?

Commit ID: d8e8324 (master branch)

Does this issue reproduce with the latest master?


Steps to Reproduce the issue

/usr/bin/time -v badger benchmark write --compression -m 5000 --dir ../data --gc-every 100h -l |& tee with_compression.txt

What did you see instead?

Got this error message:

badger 2020/07/27 15:02:50 INFO: Got compaction priority: {level:1 score:66.36421252414584 dropPrefixes:[]}
badger 2020/07/27 15:02:51 INFO: Got compaction priority: {level:1 score:66.36421252414584 dropPrefixes:[]}
badger 2020/07/27 15:02:51 ERROR: writeRequests: Mmap value log file. Path=../data/042492.vlog. Error=cannot allocate memory
Time elapsed: 07h56m12s, bytes written: 398 GB, speed: 14 MB/sec, entries written: 2486902825, speed: 87039/sec, gcSuccess: 0
Valid Keys Count: 1773556737
Invalid Keys Count: 0
Move Keys Count: 0
Internal Keys Count: 15137
panic: runtime error: invalid memory address or nil pointer dereference
	panic: Unclosed iterator at time of Txn.Discard.
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x816eaa]

goroutine 197 [running]:*Txn).Discard(0xc1a3967d40)
	/home/ibrahim/badger/txn.go:517 +0xc1
panic(0x9bf4c0, 0xe56d60)
	/usr/local/go/src/runtime/panic.go:679 +0x1b2*Skiplist).IncrRef(...)
	/home/ibrahim/badger/skl/skl.go:86*DB).getMemTables(0xc000198400, 0x0, 0x0, 0x0, 0x0)
	/home/ibrahim/badger/db.go:611 +0xea*Txn).NewIterator(0xc1a3967d40, 0x1, 0x64, 0x100, 0x0, 0x0, 0x0, 0x100, 0x0)
	/home/ibrahim/badger/iterator.go:468 +0x67, 0xc000198400)
	/home/ibrahim/badger/badger/cmd/write_bench.go:316 +0x920
created by
	/home/ibrahim/badger/badger/cmd/write_bench.go:283 +0x913
Command exited with non-zero status 2
	Command being timed: "badger benchmark write --compression -m 5000 --dir ../data --gc-every 100h -l"
	User time (seconds): 107154.42
	System time (seconds): 2715.21
	Percent of CPU this job got: 383%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 7:57:10
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 25532300
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 151083
	Minor (reclaiming a frame) page faults: 4705826
	Voluntary context switches: 308179592
	Involuntary context switches: 5385478
	Swaps: 0
	File system inputs: 207292192
	File system outputs: 1277620328
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 2

Full logs can be accessed: with_compression.txt (1.4 MB)
Heap profile: pprof.badger.alloc_objects.alloc_space.inuse_objects.inuse_space.002-c.pb.gz (121.9 KB)

Output of heap profile:

(pprof) list ensure
Total: 5.92GB
ROUTINE ========================*DB).ensureRoomForWrite in badger/db.go
         0     3.90GB (flat, cum) 65.84% of Total
         .          .    953:
         .          .    954:		db.opt.Debugf("Flushing memtable, mt.size=%d size of flushChan: %d\n",
         .          .    955:, len(db.flushChan))
         .          .    956:		// We manage to push this task. Let's modify imm.
         .          .    957:		db.imm = append(db.imm,
         .     3.90GB    958: = skl.NewSkiplist(arenaSize(db.opt))
         .          .    959:		// New memtable is empty. We certainly have room.
         .          .    960:		return nil
         .          .    961:	default:
         .          .    962:		// We need to do this to unlock and allow the flusher to modify imm.
         .          .    963:		return errNoRoom (memtables) are taking too much space. Ideally, this should not happen, as size of the default memtable is around 64MB which are later flushed to L0. We also allow only 5 memtables in memory.
Similar errors were encountered without compression also.

CC: @ibrahim

1 Like

Interesting. This might be a good candidate to use C.calloc() and

1 Like

I believe this is because the benchmark write tool creates long running iterators and these iterators hold the ref to the memtables for a long time. @ahsan also pointed out the issue is because we have a defer and that defer is not executed (we don’t return from the function).

@ahsan is running the benchmark write tool with disabled iterators to confirm our theory.

1 Like

Other than the long running iterators, as pointed by @ibrahim, the vlogMaxEntries was set with default value of 10000 in the version of badger on which the test was run. This resulted each vlog file to contain only data of approximately 15 MB but each vlog file is mmaped to 2GB. After setting vlogMaxEntries to 1000000 we didn’t encounter OOM.

@ahsan send a PR to make the key count optional in benchmark write and disable it by default.

1 Like