Panic: Base level can't be zero

When starting Dgraph (v21.03.2), it panics. This is a pretty big database (1.1 TiB) running on bare metal 248 cores and >1TiB RAM. Any idea if this can be recovered?

panic: Base level can't be zero.

goroutine 28332 [running]:
github.com/dgraph-io/badger/v3.(*levelsController).fillTablesL0ToLbase(0xc00058c0e0, 0xc005c5f780, 0x0)
        /go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/levels.go:1173 +0x925
github.com/dgraph-io/badger/v3.(*levelsController).fillTablesL0(0xc00058c0e0, 0xc005c5f780, 0x211f7cb)
        /go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/levels.go:1234 +0x35
github.com/dgraph-io/badger/v3.(*levelsController).doCompact(0xc00058c0e0, 0x1, 0x0, 0x3ff6666666666666, 0x3ff3390ff36d6091, 0x0, 0x0, 0x0, 0x0, 0xc0044a8080, ...)
        /go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/levels.go:1510 +0x28b
github.com/dgraph-io/badger/v3.(*levelsController).runCompactor.func2(0x0, 0x3ff6666666666666, 0x3ff3390ff36d6091, 0x0, 0x0, 0x0, 0x0, 0xc0044a8000, 0x7, 0x7, ...)
        /go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/levels.go:465 +0x7a
github.com/dgraph-io/badger/v3.(*levelsController).runCompactor.func3(0xc0003e9730)
        /go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/levels.go:488 +0x158
github.com/dgraph-io/badger/v3.(*levelsController).runCompactor(0xc00058c0e0, 0x1, 0xc000654030)
        /go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/levels.go:517 +0x362
created by github.com/dgraph-io/badger/v3.(*levelsController).startCompact
        /go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/levels.go:354 +0x89

What query did you used? if you are using some math function you should never divide by zero. But it should be a warning, not a panic. Can you share some prof details? https://dgraph.io/docs/howto/retrieving-debug-information/

No query at all, this is the output when I start Dgraph. It panics immediately. Because of that, I can also not get the prof details.

Have you tried to upgrade from another version to a latest one? manually*

I upgraded from v21.03.1 to v21.03.2 but that was a while back. Since then Dgraph worked, until now.

I restored the database from a backup a few days ago, but today the same error occurred again:

I1010 04:51:53.411178       1 log.go:34] [0] [E] LOG Compact 4->5 (1, 11 -> 11 tables with 4 splits). [6472557 . 6487366 6487367 6453750 6453751 6453739 6453756 6453758 6453735 6453749 6453752 6487309 .] -> [6488086 6488103 6488104 6488085 6488101 6488102 6488084 6488100 6488113 6488083 6488087 .], took 2.328s
I1010 04:51:53.486824       1 log.go:34] [1] [E] LOG Compact 4->5 (1, 12 -> 11 tables with 4 splits). [6472558 . 6487321 6487322 6487243 6452382 6452347 6452388 6452414 6452345 6452386 6452413 6453475 6488059 .] -> [6488093 6488096 6488095 6488107 6488109 6488097 6488106 6488116 6488092 6488108 6488118 .], took 2.243s
I1010 04:51:58.316560       1 log.go:34] [3] [E] LOG Compact 5->6 (1, 2 -> 3 tables with 1 splits). [6422614 . 11314 6484737 .] -> [6488169 6488175 6488180 .], took 2.701s
I1010 04:51:59.003547       1 log.go:34] [2] [E] LOG Compact 5->6 (1, 2 -> 3 tables with 1 splits). [6422484 . 2962002 6483597 .] -> [6488174 6488181 6488183 .], took 2.704s
I1010 04:51:59.861215       1 log.go:34] [1] [E] LOG Compact 5->6 (1, 4 -> 5 tables with 2 splits). [6421869 . 6000200 6000205 6000206 11292 .] -> [6488177 6488182 6488188 6488178 6488179 .], took 3.303s
I1010 04:52:00.374491       1 log.go:34] [3] [E] LOG Compact 5->6 (1, 3 -> 3 tables with 1 splits). [6421860 . 6486339 6481060 6488164 .] -> [6488187 6488194 6488195 .], took 2.056s
I1010 04:52:00.432628       1 log.go:34] [0] [E] LOG Compact 5->6 (1, 3 -> 3 tables with 1 splits). [6422641 . 6486283 6486285 11366 .] -> [6488184 6488191 6488196 .], took 2.558s
I1010 04:52:02.384838       1 log.go:34] [1] [E] LOG Compact 5->6 (1, 3 -> 3 tables with 1 splits). [6422569 . 6486301 6486305 11282 .] -> [6488200 6488201 6488203 .], took 2.521s
I1010 04:52:02.592439       1 log.go:34] [3] [E] LOG Compact 5->6 (1, 3 -> 2 tables with 1 splits). [6422595 . 6481067 6481068 6486571 .] -> [6488202 6488204 .], took 2.214s
panic: Base level can't be zero.

goroutine 100394 [running]:
github.com/dgraph-io/badger/v3.(*levelsController).fillTablesL0ToLbase(0xc000324000, 0xc0018ad780, 0x0)
	/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/levels.go:1173 +0x925
github.com/dgraph-io/badger/v3.(*levelsController).fillTablesL0(0xc000324000, 0xc0018ad780, 0x211f7cb)
	/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/levels.go:1234 +0x35
github.com/dgraph-io/badger/v3.(*levelsController).doCompact(0xc000324000, 0x2, 0x0, 0x3ff0000000000000, 0x3ff3a7a5b93e5be3, 0x0, 0x0, 0x0, 0x0, 0xc054a22c40, ...)
	/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/levels.go:1510 +0x28b
github.com/dgraph-io/badger/v3.(*levelsController).runCompactor.func2(0x0, 0x3ff0000000000000, 0x3ff3a7a5b93e5be3, 0x0, 0x0, 0x0, 0x0, 0xc054a22bc0, 0x7, 0x7, ...)
	/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/levels.go:465 +0x7a
github.com/dgraph-io/badger/v3.(*levelsController).runCompactor.func3(0xc00c20ef30)
	/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/levels.go:488 +0x158
github.com/dgraph-io/badger/v3.(*levelsController).runCompactor(0xc000324000, 0x2, 0xc007712090)
	/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/levels.go:517 +0x362
created by github.com/dgraph-io/badger/v3.(*levelsController).startCompact
	/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/levels.go:354 +0x89
I1010 04:53:19.956164       1 init.go:110] 

Dgraph version   : v21.03.2
Dgraph codename  : rocket-2
Dgraph SHA-256   : 00a53ef6d874e376d5a53740341be9b822ef1721a4980e6e2fcb60986b3abfbf
Commit SHA-1     : b17395d33
Commit timestamp : 2021-08-26 01:11:38 -0700
Branch           : HEAD
Go version       : go1.16.2
jemalloc enabled : true

For Dgraph official documentation, visit https://dgraph.io/docs.
For discussions about Dgraph     , visit http://discuss.dgraph.io.
For fully-managed Dgraph Cloud   , visit https://dgraph.io/cloud.

Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2021 Dgraph Labs, Inc.


I1010 04:53:19.956833       1 run.go:752] x.Config: {PortOffset:0 Limit:max-retries=-1; query-timeout=0ms; max-pending-queries=10000; mutations=allow; disallow-drop=false; mutations-nquad=1000000; query-edge=10000000; normalize-node=10000; txn-abort-after=5m LimitMutationsNquad:1000000 LimitQueryEdge:10000000 BlockClusterWideDrop:false LimitNormalizeNode:10000 QueryTimeout:0s MaxRetries:-1 GraphQL:introspection=true; debug=false; extensions=true; poll-interval=1s; lambda-url= GraphQLDebug:false}
I1010 04:53:19.956875       1 run.go:753] x.WorkerConfig: {TmpDir:t ExportPath:export Trace:ratio=0.01; jaeger=; datadog= MyAddr:btc_alpha:7080 ZeroAddr:[btc_zero:5080] TLSClientConfig:<nil> TLSServerConfig:<nil> Raft:pending-proposals=256; idx=; group=; learner=false; snapshot-after-entries=10000; snapshot-after-duration=30m Badger:{Dir: ValueDir: SyncWrites:false NumVersionsToKeep:1 ReadOnly:false Logger:0xc0004689c0 Compression:1 InMemory:false MetricsEnabled:true NumGoroutines:8 MemTableSize:67108864 BaseTableSize:2097152 BaseLevelSize:10485760 LevelSizeMultiplier:10 TableSizeMultiplier:2 MaxLevels:7 VLogPercentile:0 ValueThreshold:1048576 NumMemtables:5 BlockSize:4096 BloomFalsePositive:0.01 BlockCacheSize:697932185 IndexCacheSize:375809638 NumLevelZeroTables:5 NumLevelZeroTablesStall:15 ValueLogFileSize:1073741823 ValueLogMaxEntries:1000000 NumCompactors:4 CompactL0OnClose:false LmaxCompaction:false ZSTDCompressionLevel:0 VerifyValueChecksum:false EncryptionKey:[] EncryptionKeyRotationDuration:240h0m0s BypassLockGuard:false ChecksumVerificationMode:0 DetectConflicts:true NamespaceOffset:-1 managedTxns:false maxBatchCount:0 maxBatchSize:0 maxValueThreshold:0} WhiteListedIPRanges:[{Lower:172.31.0.1 Upper:172.31.0.1}] StrictMutations:false AclEnabled:false HmacSecret:**** AbortOlderThan:5m0s ProposedGroupId:0 StartTime:2021-10-10 04:53:19.704447498 +0000 UTC m=+0.221582974 Ludicrous:enabled=false; concurrency=2000 LudicrousEnabled:false Security:token=; whitelist=172.31.0.1 EncryptionKey:**** LogRequest:0 HardSync:false Audit:false}
I1010 04:53:19.956951       1 run.go:754] worker.Config: {PostingDir:p WALDir:w MutationsMode:0 AuthToken: HmacSecret:**** AccessJwtTtl:0s RefreshJwtTtl:0s CachePercentage:0,65,35 CacheMb:1024 Audit:<nil> ChangeDataConf:file=; kafka=; sasl_user=; sasl_password=; ca_cert=; client_cert=; client_key=; sasl-mechanism=PLAIN;}
I1010 04:53:19.957090       1 log.go:295] Found file: 399 First Index: 669310
I1010 04:53:19.957122       1 log.go:295] Found file: 400 First Index: 670892
I1010 04:53:19.957148       1 log.go:295] Found file: 401 First Index: 672515
I1010 04:53:19.957172       1 log.go:295] Found file: 402 First Index: 674121
I1010 04:53:19.957194       1 log.go:295] Found file: 403 First Index: 675926
I1010 04:53:19.957240       1 storage.go:125] Init Raft Storage with snap: 669324, first: 669325, last: 676899
I1010 04:53:19.957255       1 server_state.go:140] Opening postings BadgerDB with options: {Dir:p ValueDir:p SyncWrites:false NumVersionsToKeep:2147483647 ReadOnly:false Logger:0x342b618 Compression:1 InMemory:false MetricsEnabled:true NumGoroutines:8 MemTableSize:67108864 BaseTableSize:2097152 BaseLevelSize:10485760 LevelSizeMultiplier:10 TableSizeMultiplier:2 MaxLevels:7 VLogPercentile:0 ValueThreshold:1048576 NumMemtables:5 BlockSize:4096 BloomFalsePositive:0.01 BlockCacheSize:697932185 IndexCacheSize:375809638 NumLevelZeroTables:5 NumLevelZeroTablesStall:15 ValueLogFileSize:1073741823 ValueLogMaxEntries:1000000 NumCompactors:4 CompactL0OnClose:false LmaxCompaction:false ZSTDCompressionLevel:0 VerifyValueChecksum:false EncryptionKey:[] EncryptionKeyRotationDuration:240h0m0s BypassLockGuard:false ChecksumVerificationMode:0 DetectConflicts:false NamespaceOffset:1 managedTxns:false maxBatchCount:0 maxBatchSize:0 maxValueThreshold:0}
I1010 04:53:23.332091       1 log.go:34] 3881 tables out of 27905 opened in 3s
I1010 04:53:26.390056       1 log.go:34] 7837 tables out of 27905 opened in 6.058s
I1010 04:53:29.346552       1 log.go:34] 11848 tables out of 27905 opened in 9.014s
I1010 04:53:32.332869       1 log.go:34] 14744 tables out of 27905 opened in 12.001s
I1010 04:53:35.332690       1 log.go:34] 19039 tables out of 27905 opened in 15.001s
I1010 04:53:38.434052       1 log.go:34] 22346 tables out of 27905 opened in 18.102s
I1010 04:53:41.335079       1 log.go:34] 25511 tables out of 27905 opened in 21.003s
I1010 04:53:43.376178       1 log.go:34] All 27905 tables opened in 23.044s
I1010 04:53:43.409437       1 log.go:34] Discard stats nextEmptySlot: 0
I1010 04:53:43.438716       1 log.go:34] Set nextTxnTs to 636665
I1010 04:53:43.526026       1 log.go:34] Deleting empty file: p/000003.vlog
I1010 04:53:43.535642       1 groups.go:100] Current Raft Id: 0x1
I1010 04:53:43.535660       1 groups.go:116] Sending member request to Zero: id:1 addr:"btc_alpha:7080" 
I1010 04:53:43.535980       1 worker.go:114] Worker listening at address: [::]:7080
I1010 04:53:43.537519       1 run.go:565] Bringing up GraphQL HTTP API at 0.0.0.0:8080/graphql
I1010 04:53:43.537535       1 run.go:566] Bringing up GraphQL HTTP admin API at 0.0.0.0:8080/admin
I1010 04:53:43.537547       1 run.go:593] gRPC server started.  Listening on port 9080
I1010 04:53:43.537553       1 run.go:594] HTTP server started.  Listening on port 8080
E1010 04:53:43.537567       1 groups.go:1181] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
I1010 04:53:43.636149       1 pool.go:164] CONN: Connecting to btc_zero:5080
I1010 04:53:43.640428       1 groups.go:135] Connected to group zero. Assigned group: 0
I1010 04:53:43.640444       1 groups.go:137] Raft Id after connection to Zero: 0x1
I1010 04:53:43.640480       1 draft.go:270] Node ID: 0x1 with GroupID: 1
I1010 04:53:43.640486       1 draft.go:279] RaftContext: id:1 group:1 addr:"btc_alpha:7080" 
I1010 04:53:43.640560       1 node.go:152] Setting raft.Config to: &{ID:1 peers:[] learners:[] ElectionTick:20 HeartbeatTick:1 Storage:0xc00047cc00 Applied:669324 MaxSizePerMsg:262144 MaxCommittedSizePerReady:67108864 MaxUncommittedEntriesSize:0 MaxInflightMsgs:256 CheckQuorum:false PreVote:true ReadOnlyOption:0 Logger:0x342b618 DisableProposalForwarding:false}
I1010 04:53:43.760546       1 node.go:310] Found Snapshot.Metadata: {ConfState:{Nodes:[1] Learners:[] XXX_unrecognized:[]} Index:669324 Term:4 XXX_unrecognized:[]}
I1010 04:53:43.760575       1 node.go:321] Found hardstate: {Term:4 Vote:1 Commit:676899 XXX_unrecognized:[]}
panic: Base level can't be zero.

goroutine 28264 [running]:
github.com/dgraph-io/badger/v3.(*levelsController).fillTablesL0ToLbase(0xc000310000, 0xc001fdd780, 0x0)
	/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/levels.go:1173 +0x925
github.com/dgraph-io/badger/v3.(*levelsController).fillTablesL0(0xc000310000, 0xc001fdd780, 0x211f7cb)
	/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/levels.go:1234 +0x35
github.com/dgraph-io/badger/v3.(*levelsController).doCompact(0xc000310000, 0x2, 0x0, 0x3ff6666666666666, 0x3ff3a7a5b93e5be3, 0x0, 0x0, 0x0, 0x0, 0xc009ad2080, ...)
	/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/levels.go:1510 +0x28b
github.com/dgraph-io/badger/v3.(*levelsController).runCompactor.func2(0x0, 0x3ff6666666666666, 0x3ff3a7a5b93e5be3, 0x0, 0x0, 0x0, 0x0, 0xc009ad2000, 0x7, 0x7, ...)
	/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/levels.go:465 +0x7a
github.com/dgraph-io/badger/v3.(*levelsController).runCompactor.func3(0xc00808ef30)
	/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/levels.go:488 +0x158
github.com/dgraph-io/badger/v3.(*levelsController).runCompactor(0xc000310000, 0x2, 0xc0004cb770)
	/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/levels.go:517 +0x362
created by github.com/dgraph-io/badger/v3.(*levelsController).startCompact
	/go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/levels.go:354 +0x89
I1010 04:53:48.765074       1 init.go:110] 

It seems like this happens during compaction (https://github.com/dgraph-io/badger/blob/master/levels.go#L1183)

@MichelDiz I think I have a solution to my problem:

Dgraph sets by default 7 compaction levels (1 memory level and 6 on-disk levels) and the base level size to 10 MiB with a multiplier of 10. So the maximum size of the 6 on-disk levels should be as follows:

1 → 10 MiB
2 → 100 MiB
3 → 1 GiB
4 → 10 GiB
5 → 100 GiB
6 → 1 TiB

Dgraph panics in my case after it grows to ~1.1 TiB, which are roughly the 6 levels combined. After I set --badger "maxlevels=8", Dgraph compacted a lot of data from level 6 to the new 7th level and now it works again. I could also probably have increased the base level instead.

Can somebody from the Dgraph team confirm if my reasoning is correct? If yes, there should be a note in the docs that if data grows over 1 TiB the default values have to be modified.

2 Likes