Hi,
I’m running a cluster on my locale machine. the setup is 3 alpha and 1 zero.
I have the issue that after a power outage, my cluster didn’t start up.
See the error log below.
I already followed the recommendation in the past the clean the alpha directory. This worked.
But at other times I used the same directories while upgrading dgraph version.
This time the cluster was running for 2 months, without any upgrades and without any issues.
Until now.
As there was no version upgrade, just a power outage, which means a hard shutdown. how to recover from this without deleting all my data again?
Dgraph version : v22.0.2
Dgraph codename : dgraph
Dgraph SHA-256 : a11258bf3352eff0521bc68983a5aedb9316414947719920d75f12143dd368bd
Commit SHA-1 : 55697a4
Commit timestamp : 2022-12-16 23:03:35 +0000
Branch : release/v22.0.2
Go version : go1.18.5
jemalloc enabled : true
For Dgraph official documentation, visit https://dgraph.io/docs.
For discussions about Dgraph , visit http://discuss.dgraph.io.
For fully-managed Dgraph Cloud , visit https://dgraph.io/cloud.
Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2022 Dgraph Labs, Inc.
I0221 18:36:36.591444 15 run.go:754] x.Config: {PortOffset:0 Limit:mutations=allow; normalize-node=10000; disallow-drop=false; query-timeout=0ms; query-edge=1000000; mutations-nquad=1000000; txn-abort-after=5m; max-retries=-1; max-pending-queries=10000 LimitMutationsNquad:1000000 LimitQueryEdge:1000000 BlockClusterWideDrop:false LimitNormalizeNode:10000 QueryTimeout:0s MaxRetries:-1 GraphQL:introspection=true; debug=false; extensions=true; poll-interval=1s; lambda-url= GraphQLDebug:false}
[Sentry] 2023/02/21 18:36:32 Sending fatal event [57586dcf4137421587f662673d10ae2e] to o318308.ingest.sentry.io project: 1805390
[Sentry] 2023/02/21 18:36:32 Too many requests, backing off till: 2023-02-21 18:36:51.15722936 +0000 UTC m=+20.210434431
[Sentry] 2023/02/21 18:36:32 Buffer flushed successfully.
[Sentry] 2023/02/21 18:36:35 Integration installed: ContextifyFrames
[Sentry] 2023/02/21 18:36:35 Integration installed: Environment
[Sentry] 2023/02/21 18:36:35 Integration installed: Modules
[Sentry] 2023/02/21 18:36:35 Integration installed: IgnoreErrors
[Sentry] 2023/02/21 18:36:36 Integration installed: ContextifyFrames
[Sentry] 2023/02/21 18:36:36 Integration installed: Environment
[Sentry] 2023/02/21 18:36:36 Integration installed: Modules
[Sentry] 2023/02/21 18:36:36 Integration installed: IgnoreErrors
I0221 18:36:36.591489 15 run.go:755] x.WorkerConfig: {TmpDir:t ExportPath:export Trace:ratio=0.01; jaeger=; datadog= MyAddr:alpha2:7080 ZeroAddr:[zero:5080] TLSClientConfig:<nil> TLSServerConfig:<nil> Raft:snapshot-after-entries=10000; snapshot-after-duration=30m; pending-proposals=256; idx=; group=; learner=false Badger:{Dir: ValueDir: SyncWrites:false NumVersionsToKeep:1 ReadOnly:false Logger:0xc00037a740 Compression:1 InMemory:false MetricsEnabled:true NumGoroutines:8 MemTableSize:67108864 BaseTableSize:2097152 BaseLevelSize:10485760 LevelSizeMultiplier:10 TableSizeMultiplier:2 MaxLevels:7 VLogPercentile:0 ValueThreshold:1048576 NumMemtables:5 BlockSize:4096 BloomFalsePositive:0.01 BlockCacheSize:697932185 IndexCacheSize:375809638 NumLevelZeroTables:5 NumLevelZeroTablesStall:15 ValueLogFileSize:1073741823 ValueLogMaxEntries:1000000 NumCompactors:4 CompactL0OnClose:false LmaxCompaction:false ZSTDCompressionLevel:0 VerifyValueChecksum:false EncryptionKey:[] EncryptionKeyRotationDuration:240h0m0s BypassLockGuard:false ChecksumVerificationMode:0 DetectConflicts:true NamespaceOffset:-1 managedTxns:false maxBatchCount:0 maxBatchSize:0 maxValueThreshold:0} WhiteListedIPRanges:[{Lower:10.0.0.0 Upper:10.255.255.255} {Lower:172.0.0.0 Upper:172.255.255.255} {Lower:192.168.0.0 Upper:192.168.255.255}] StrictMutations:false AclEnabled:false HmacSecret:**** AbortOlderThan:5m0s ProposedGroupId:0 StartTime:2023-02-21 18:36:36.190396318 +0000 UTC m=+0.151868611 Ludicrous:enabled=false; concurrency=2000 LudicrousEnabled:false Security:whitelist=10.0.0.0/8,172.0.0.0/8,192.168.0.0/16; token= EncryptionKey:**** LogDQLRequest:0 HardSync:false Audit:false}
I0221 18:36:36.591554 15 run.go:756] worker.Config: {PostingDir:p WALDir:w MutationsMode:0 AuthToken: HmacSecret:**** AccessJwtTtl:0s RefreshJwtTtl:0s CachePercentage:0,65,35 CacheMb:1024 Audit:<nil> ChangeDataConf:file=; kafka=; sasl_user=; sasl_password=; ca_cert=; client_cert=; client_key=; sasl-mechanism=PLAIN;}
I0221 18:36:36.594677 15 log.go:295] Found file: 1 First Index: 1
I0221 18:36:36.595466 15 storage.go:125] Init Raft Storage with snap: 150, first: 151, last: 6885
I0221 18:36:36.595968 15 server_state.go:140] Opening postings BadgerDB with options: {Dir:p ValueDir:p SyncWrites:false NumVersionsToKeep:2147483647 ReadOnly:false Logger:0x3113340 Compression:1 InMemory:false MetricsEnabled:true NumGoroutines:8 MemTableSize:67108864 BaseTableSize:2097152 BaseLevelSize:10485760 LevelSizeMultiplier:10 TableSizeMultiplier:2 MaxLevels:7 VLogPercentile:0 ValueThreshold:1048576 NumMemtables:5 BlockSize:4096 BloomFalsePositive:0.01 BlockCacheSize:697932185 IndexCacheSize:375809638 NumLevelZeroTables:5 NumLevelZeroTablesStall:15 ValueLogFileSize:1073741823 ValueLogMaxEntries:1000000 NumCompactors:4 CompactL0OnClose:false LmaxCompaction:false ZSTDCompressionLevel:0 VerifyValueChecksum:false EncryptionKey:[] EncryptionKeyRotationDuration:240h0m0s BypassLockGuard:false ChecksumVerificationMode:0 DetectConflicts:false NamespaceOffset:1 managedTxns:false maxBatchCount:0 maxBatchSize:0 maxValueThreshold:0}
I0221 18:36:36.616155 15 log.go:34] All 5 tables opened in 2ms
I0221 18:36:36.617050 15 log.go:34] Discard stats nextEmptySlot: 0
I0221 18:36:36.617105 15 log.go:34] Set nextTxnTs to 230127
I0221 18:36:36.620390 15 log.go:34] Deleting empty file: p/000058.vlog
I0221 18:36:36.624162 15 groups.go:100] Current Raft Id: 0x3
I0221 18:36:36.624241 15 groups.go:116] Sending member request to Zero: id:3 addr:"alpha2:7080"
I0221 18:36:36.624363 15 worker.go:114] Worker listening at address: [::]:7080
I0221 18:36:36.625050 15 run.go:567] Bringing up GraphQL HTTP API at 0.0.0.0:8080/graphql
I0221 18:36:36.625060 15 run.go:568] Bringing up GraphQL HTTP admin API at 0.0.0.0:8080/admin
I0221 18:36:36.625066 15 run.go:595] gRPC server started. Listening on port 9080
I0221 18:36:36.625070 15 run.go:596] HTTP server started. Listening on port 8080
E0221 18:36:36.625087 15 groups.go:1224] Error during SubscribeForUpdates for prefix "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
I0221 18:36:36.725500 15 pool.go:164] CONN: Connecting to zero:5080
I0221 18:36:36.728163 15 groups.go:135] Connected to group zero. Assigned group: 0
I0221 18:36:36.728190 15 groups.go:137] Raft Id after connection to Zero: 0x3
I0221 18:36:36.728243 15 pool.go:164] CONN: Connecting to alpha1:7080
I0221 18:36:36.728263 15 draft.go:270] Node ID: 0x3 with GroupID: 1
I0221 18:36:36.728267 15 draft.go:279] RaftContext: id:3 group:1 addr:"alpha2:7080"
I0221 18:36:36.728315 15 node.go:153] Setting raft.Config to: &{ID:3 peers:[] learners:[] ElectionTick:20 HeartbeatTick:1 Storage:0xc00044e700 Applied:150 MaxSizePerMsg:262144 MaxCommittedSizePerReady:67108864 MaxUncommittedEntriesSize:0 MaxInflightMsgs:256 CheckQuorum:false PreVote:true ReadOnlyOption:0 Logger:0x3113340 DisableProposalForwarding:false}
I0221 18:36:36.732598 15 node.go:312] Found Snapshot.Metadata: {ConfState:{Nodes:[1 2 3] Learners:[] XXX_unrecognized:[]} Index:150 Term:5 XXX_unrecognized:[]}
I0221 18:36:36.732674 15 node.go:323] Found hardstate: {Term:30 Vote:3 Commit:6886 XXX_unrecognized:[]}
I0221 18:36:36.736962 15 node.go:328] Group 1 found 6885 entries
I0221 18:36:36.737011 15 draft.go:1828] Restarting node for group: 1
I0221 18:36:36.737031 15 node.go:190] Setting conf state to nodes:1 nodes:2 nodes:3
2023/02/21 18:36:36 3 state.commit 6886 is out of range [150, 6885]
panic: 3 state.commit 6886 is out of range [150, 6885]
goroutine 203 [running]:
log.Panicf({0x1f5c0b6?, 0xaf3305?}, {0xc0004b1580?, 0x1c949c0?, 0x7fc2dcb3bf00?})
/opt/hostedtoolcache/go/1.18.5/x64/src/log/log.go:392 +0x67
github.com/dgraph-io/dgraph/x.(*ToGlog).Panicf(0x1ae5?, {0x1f5c0b6?, 0xaf9ff4?}, {0xc0004b1580?, 0xaf5a87?, 0x40?})
/home/runner/work/dgraph/dgraph/x/log.go:40 +0x32
go.etcd.io/etcd/raft.(*raft).loadState(0xc000376780, {0x1e, 0x3, 0x1ae6, {0x0, 0x0, 0x0}})
/home/runner/go/pkg/mod/go.etcd.io/etcd@v0.5.0-alpha.5.0.20190108173120-83c051b701d3/raft/raft.go:1475 +0x19a
go.etcd.io/etcd/raft.newRaft(0xc000336f20)
/home/runner/go/pkg/mod/go.etcd.io/etcd@v0.5.0-alpha.5.0.20190108173120-83c051b701d3/raft/raft.go:377 +0x791
go.etcd.io/etcd/raft.RestartNode(0xc000336f20)
/home/runner/go/pkg/mod/go.etcd.io/etcd@v0.5.0-alpha.5.0.20190108173120-83c051b701d3/raft/node.go:242 +0x33
github.com/dgraph-io/dgraph/worker.(*node).InitAndStartNode(0xc000422500)
/home/runner/work/dgraph/dgraph/worker/draft.go:1854 +0x6f2
github.com/dgraph-io/dgraph/worker.StartRaftNodes(0xc00044e700, 0x1?)
/home/runner/work/dgraph/dgraph/worker/groups.go:155 +0x635
github.com/dgraph-io/dgraph/dgraph/cmd/alpha.run.func4()
/home/runner/work/dgraph/dgraph/dgraph/cmd/alpha/run.go:814 +0x34
created by github.com/dgraph-io/dgraph/dgraph/cmd/alpha.run
/home/runner/work/dgraph/dgraph/dgraph/cmd/alpha/run.go:813 +0x14b1
W0221 18:36:36.747097 1 sentry_integration.go:140] unable to read CID from file /tmp/dgraph-alpha-cid-sentry open /tmp/dgraph-alpha-cid-sentry: no such file or directory. Skip
[Sentry] 2023/02/21 18:36:36 Sending fatal event [f5a6939fde6a49c28445d4dd5a95bcb8] to o318308.ingest.sentry.io project: 1805390
[Sentry] 2023/02/21 18:36:36 Buffer flushed successfully.