After an emergency server shutdown, the data in the zw folder was corrupted

There was one node running in docker:
docker run -d -p 8080:8080 -p 9080:9080 -p 8000:8000 -v ~/dgraph:/dgraph dgraph/standalone:v20.11.2
After an emergency server shutdown, the data in the zw folder was corrupted.
Zero node don’t startup.

Is it possible to fix the zw directory so as not to manually recover data from a badger dump?

Startup logs:

Dgraph version : v20.11.2
Dgraph codename : tchalla-2
Dgraph SHA-256 : 0153cb8d3941ad5ad107e395b347e8d930a0b4ead6f4524521f7a525a9699167
Commit SHA-1 : 94f3a0430
Commit timestamp : 2021-02-23 13:07:17 +0530
Branch : HEAD
Go version : go1.15.5
jemalloc enabled : true


Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2020 Dgraph Labs, Inc.


I0426 11:43:23.318305      42 run.go:185] Setting Config to: {bindall:true portOffset:0 nodeId:1 numReplicas:1 peer: w:zw rebalanceInterval:480000000000 tlsClientConfig:<nil>}
I0426 11:43:23.318330      42 run.go:98] Setting up grpc listener at: 0.0.0.0:5080
I0426 11:43:23.318396      42 run.go:98] Setting up http listener at: 0.0.0.0:6080
I0426 11:43:23.318864      42 log.go:295] Found file: 1 First Index: 1
I0426 11:43:23.319482      42 storage.go:132] Init Raft Storage with snap: 1629, first: 1630, last: 1635
I0426 11:43:23.319617      42 node.go:152] Setting raft.Config to: &{ID:1 peers:[] learners:[] ElectionTick:20 HeartbeatTick:1 Storage:0xc000140500 Applied:1629 MaxSizePerMsg:262144 MaxCommittedSizePerReady:67108864 MaxUncommittedEntriesSize:0 MaxInflightMsgs:256 CheckQuorum:false PreVote:true ReadOnlyOption:0 Logger:0x2e0fef8 DisableProposalForwarding:false}
I0426 11:43:23.321059      42 node.go:310] Found Snapshot.Metadata: {ConfState:{Nodes:[1] Learners:[] XXX_unrecognized:[]} Index:1629 Term:30 XXX_unrecognized:[]}
I0426 11:43:23.321538      42 node.go:321] Found hardstate: {Term:31 Vote:1 Commit:1635 XXX_unrecognized:[]}
I0426 11:43:23.323385      42 node.go:326] Group 0 found 1635 entries
I0426 11:43:23.323399      42 raft.go:544] Restarting node for dgraphzero
I0426 11:43:23.323406      42 node.go:189] Setting conf state to nodes:1 
I0426 11:43:23.323548      42 pool.go:162] CONNECTING to localhost:7080
I0426 11:43:23.323569      42 log.go:34] 1 became follower at term 31
I0426 11:43:23.323577      42 log.go:34] newRaft 1 [peers: [1], term: 31, commit: 1635, applied: 1629, lastindex: 1635, lastterm: 31]
W0426 11:43:23.324414      42 pool.go:267] Connection lost with localhost:7080. Error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:7080: connect: connection refused"
[Sentry] 2021/04/26 11:43:23 Sending fatal event [1bfeda7104a7419f9feaa86f21ff967e] to o318308.ingest.sentry.io project: 1805390
2021/04/26 11:43:23 proto: Group: illegal tag 0 (wire type 0)

github.com/dgraph-io/dgraph/x.Check
	/ext-go/1/src/github.com/dgraph-io/dgraph/x/error.go:42
github.com/dgraph-io/dgraph/dgraph/cmd/zero.run
	/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/zero/run.go:254
github.com/dgraph-io/dgraph/dgraph/cmd/zero.init.0.func1
	/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/zero/run.go:75
github.com/spf13/cobra.(*Command).execute
	/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:830
github.com/spf13/cobra.(*Command).ExecuteC
	/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:914
github.com/spf13/cobra.(*Command).Execute
	/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:864
github.com/dgraph-io/dgraph/dgraph/cmd.Execute
	/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/root.go:72
main.main
	/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/main.go:102
runtime.main
	/usr/local/go/src/runtime/proc.go:204
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1374
I0426 11:43:23.338890      41 log.go:34] All 3 tables opened in 0s
I0426 11:43:23.341857      41 log.go:34] Discard stats nextEmptySlot: 0
I0426 11:43:23.341885      41 log.go:34] Set nextTxnTs to 270488
I0426 11:43:23.342002      41 log.go:34] Deleting empty file: p/000035.vlog
I0426 11:43:23.496884      41 groups.go:99] Current Raft Id: 0x1
E0426 11:43:23.496924      41 groups.go:1143] Error during SubscribeForUpdates for prefix "\x00\x00\vdgraph.cors\x00": Unable to find any servers for group: 1. closer err: <nil>
I0426 11:43:23.496972      41 worker.go:104] Worker listening at address: [::]:7080
I0426 11:43:23.498317      41 run.go:519] Bringing up GraphQL HTTP API at 0.0.0.0:8080/graphql
E0426 11:43:23.498312      41 groups.go:1143] Error during SubscribeForUpdates for prefix "\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
I0426 11:43:23.498334      41 run.go:520] Bringing up GraphQL HTTP admin API at 0.0.0.0:8080/admin
I0426 11:43:23.498359      41 run.go:552] gRPC server started.  Listening on port 9080
I0426 11:43:23.498370      41 run.go:553] HTTP server started.  Listening on port 8080
I0426 11:43:23.597110      41 pool.go:162] CONNECTING to localhost:5080
W0426 11:43:23.597888      41 pool.go:267] Connection lost with localhost:5080. Error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:5080: connect: connection refused"
E0426 11:43:24.497416      41 groups.go:1143] Error during SubscribeForUpdates for prefix "\x00\x00\vdgraph.cors\x00": Unable to find any servers for group: 1. closer err: <nil>
E0426 11:43:24.498426      41 groups.go:1143] Error during SubscribeForUpdates for prefix "\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E0426 11:43:25.497530      41 groups.go:1143] Error during SubscribeForUpdates for prefix "\x00\x00\vdgraph.cors\x00": Unable to find any servers for group: 1. closer err: <nil>
E0426 11:43:25.498539      41 groups.go:1143] Error during SubscribeForUpdates for prefix "\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>

@hardik – can we get the QA team to test this scenario? This shouldn’t happen.

3 Likes

It is not clear how to fix it. I tried all the options that I could find. The only thing that happened was to dump badger. But this does not allow restoring the operability of the graph base node.