Memtable memory usage causes OOM

Naman · July 28, 2020, 2:35pm

What version of Go are you using (`go version`)?

$ go version
go version go1.13.12 linux/amd64

What operating system are you using?

Linux username 4.19.0-9-cloud-amd64 #1 SMP Debian 4.19.118-2+deb10u1 (2020-06-07) x86_64 GNU/Linux
Ubuntu (32 GB RAM, 1.5TB SDD)

What version of Badger are you using?

Commit ID: d8e8324 (master branch)

Does this issue reproduce with the latest master?

Yes

Steps to Reproduce the issue

/usr/bin/time -v badger benchmark write --compression -m 5000 --dir ../data --gc-every 100h -l |& tee with_compression.txt

What did you see instead?

Got this error message:

badger 2020/07/27 15:02:50 INFO: Got compaction priority: {level:1 score:66.36421252414584 dropPrefixes:[]}
badger 2020/07/27 15:02:51 INFO: Got compaction priority: {level:1 score:66.36421252414584 dropPrefixes:[]}
badger 2020/07/27 15:02:51 ERROR: writeRequests: Mmap value log file. Path=../data/042492.vlog. Error=cannot allocate memory
Time elapsed: 07h56m12s, bytes written: 398 GB, speed: 14 MB/sec, entries written: 2486902825, speed: 87039/sec, gcSuccess: 0
Valid Keys Count: 1773556737
Invalid Keys Count: 0
Move Keys Count: 0
Internal Keys Count: 15137
panic: runtime error: invalid memory address or nil pointer dereference
	panic: Unclosed iterator at time of Txn.Discard.
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x816eaa]

goroutine 197 [running]:
github.com/dgraph-io/badger/v2.(*Txn).Discard(0xc1a3967d40)
	/home/ibrahim/badger/txn.go:517 +0xc1
panic(0x9bf4c0, 0xe56d60)
	/usr/local/go/src/runtime/panic.go:679 +0x1b2
github.com/dgraph-io/badger/v2/skl.(*Skiplist).IncrRef(...)
	/home/ibrahim/badger/skl/skl.go:86
github.com/dgraph-io/badger/v2.(*DB).getMemTables(0xc000198400, 0x0, 0x0, 0x0, 0x0)
	/home/ibrahim/badger/db.go:611 +0xea
github.com/dgraph-io/badger/v2.(*Txn).NewIterator(0xc1a3967d40, 0x1, 0x64, 0x100, 0x0, 0x0, 0x0, 0x100, 0x0)
	/home/ibrahim/badger/iterator.go:468 +0x67
github.com/dgraph-io/badger/v2/badger/cmd.reportStats(0xc000121140, 0xc000198400)
	/home/ibrahim/badger/badger/cmd/write_bench.go:316 +0x920
created by github.com/dgraph-io/badger/v2/badger/cmd.writeBench
	/home/ibrahim/badger/badger/cmd/write_bench.go:283 +0x913
Command exited with non-zero status 2
	Command being timed: "badger benchmark write --compression -m 5000 --dir ../data --gc-every 100h -l"
	User time (seconds): 107154.42
	System time (seconds): 2715.21
	Percent of CPU this job got: 383%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 7:57:10
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 25532300
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 151083
	Minor (reclaiming a frame) page faults: 4705826
	Voluntary context switches: 308179592
	Involuntary context switches: 5385478
	Swaps: 0
	File system inputs: 207292192
	File system outputs: 1277620328
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 2

Full logs can be accessed: with_compression.txt (1.4 MB)
Heap profile: pprof.badger.alloc_objects.alloc_space.inuse_objects.inuse_space.002-c.pb.gz (121.9 KB)

Output of heap profile:

(pprof) list ensure
Total: 5.92GB
ROUTINE ======================== github.com/dgraph-io/badger/v2.(*DB).ensureRoomForWrite in badger/db.go
         0     3.90GB (flat, cum) 65.84% of Total
         .          .    953:
         .          .    954:		db.opt.Debugf("Flushing memtable, mt.size=%d size of flushChan: %d\n",
         .          .    955:			db.mt.MemSize(), len(db.flushChan))
         .          .    956:		// We manage to push this task. Let's modify imm.
         .          .    957:		db.imm = append(db.imm, db.mt)
         .     3.90GB    958:		db.mt = skl.NewSkiplist(arenaSize(db.opt))
         .          .    959:		// New memtable is empty. We certainly have room.
         .          .    960:		return nil
         .          .    961:	default:
         .          .    962:		// We need to do this to unlock and allow the flusher to modify imm.
         .          .    963:		return errNoRoom

db.mt (memtables) are taking too much space. Ideally, this should not happen, as size of the default memtable is around 64MB which are later flushed to L0. We also allow only 5 memtables in memory.
Similar errors were encountered without compression also.

CC: @ibrahim

mrjn · July 28, 2020, 3:38pm

Interesting. This might be a good candidate to use C.calloc() and C.free().

ibrahim · August 10, 2020, 4:47pm

I believe this is because the benchmark write tool creates long running iterators and these iterators hold the ref to the memtables for a long time. @ahsan also pointed out the issue is because we have a defer and that defer is not executed (we don’t return from the function).

github.com

dgraph-io/badger/blob/3e067a54e5964bf7640c7d74fa92cd015f2a3fc8/badger/cmd/write_bench.go#L320


			return
		case <-t.C:
			txn := db.NewTransaction(false)
			defer txn.Discard()

			iopt := badger.DefaultIteratorOptions
			iopt.AllVersions = true
			iopt.InternalAccess = true

			it := txn.NewIterator(iopt)
			defer it.Close()
			for it.Rewind(); it.Valid(); it.Next() {
				i := it.Item()
				if bytes.HasPrefix(i.Key(), []byte("!badger!")) {
					internalKeyCount++
				}
				if bytes.HasPrefix(i.Key(), []byte("!badger!Move")) {
					moveKeyCount++
				}
				if i.IsDeletedOrExpired() {
					invalidKeyCount++

@ahsan is running the benchmark write tool with disabled iterators to confirm our theory.

ahsan · August 13, 2020, 8:42pm

Other than the long running iterators, as pointed by @ibrahim, the vlogMaxEntries was set with default value of 10000 in the version of badger on which the test was run. This resulted each vlog file to contain only data of approximately 15 MB but each vlog file is mmaped to 2GB. After setting vlogMaxEntries to 1000000 we didn’t encounter OOM.

github.com

dgraph-io/badger/blob/763e7d7303f616fb4cfd36e6e21110ec5649df39/value.go#L1021


	if err = lf.bootstrap(); err != nil {
		removeFile()
		return nil, err
	}

	if err = syncDir(vlog.dirPath); err != nil {
		removeFile()
		return nil, errFile(err, vlog.dirPath, "Sync value log dir")
	}

	if err = lf.mmap(2 * vlog.opt.ValueLogFileSize); err != nil {
		removeFile()
		return nil, errFile(err, lf.path, "Mmap value log file")
	}

	vlog.filesLock.Lock()
	vlog.filesMap[fid] = lf
	vlog.maxFid = fid
	// writableLogOffset is only written by write func, by read by Read func.
	// To avoid a race condition, all reads and updates to this variable must be
	// done via atomics.

ibrahim · August 14, 2020, 5:54am

@ahsan send a PR to make the key count optional in benchmark write and disable it by default.

Topic		Replies	Views
Badger performance and memory usage recommendations Badger	9	5609	June 27, 2020
Provide simple option for limiting total memory usage Issues badger , kind:enhancement , status:accepted , area:documentation	19	2243	May 1, 2022
Crash with memtable sizes over 64 << 24 Badger	1	645	November 7, 2021
Badger compaction results in OOM and loss of data Badger	7	1648	November 21, 2020
Badger write performance on 1 TB data Badger	4	2886	March 15, 2022