LOG Compact FAILED with error: MANIFEST removes non-existing table 15777621,

There are 3 groups of 3 replicates, and one node has the following error, the query is no longer responding, causing a block.How should I deal with it?

W0716 14:37:32.990118 16346 log.go:36] [Compactor: 2] LOG Compact FAILED with error: MANIFEST removes non-existing table 15777621: {span:0xc00ceb4700 compactorId:2 t:{baseLevel:2 targetSz:[0 10485760 10485760 30006262 300062624 3000626242 30006262429] fileSz:[67108864 2097152 2097152 4194304 8388608 16777216 33554432]} p:{level:2 score:3.043694591522217 adjusted:3.254670825337362 dropPrefixes: t:{baseLevel:2 targetSz:[0 10485760 10485760 30006262 300062624 3000626242 30006262429] fileSz:[67108864 2097152 2097152 4194304 8388608 16777216 33554432]}} thisLevel:0xc00078e420 nextLevel:0xc00078e480 top:[0xc042cd72c0] bot:[0xc05441b440] thisRange:{left:[4 0 9 114 111 111 116 85 103 99 73 100 2 6 1 0 0 0 0 0 0 0 1 8 83 160 210 61 139 110 246 0 0 0 0 0 0 0 0] right:[4 0 9 114 111 111 116 85 103 99 73 100 2 6 1 0 0 0 0 0 0 0 1 8 83 160 210 61 176 39 145 255 255 255 255 255 255 255 255] inf:false} nextRange:{left:[4 0 9 114 111 111 116 85 103 99 73 100 2 6 1 0 0 0 0 0 0 0 1 8 83 160 210 61 139 110 246 0 0 0 0 0 0 0 0] right:[4 0 9 114 111 111 116 85 103 99 73 100 2 6 1 0 0 0 0 0 0 0 1 8 83 160 210 62 74 38 126 255 255 255 255 255 255 255 255] inf:false} splits: thisSize:2563487 dropPrefixes:}
W0716 14:37:32.990209 16346 log.go:36] While running doCompact: MANIFEST removes non-existing table 15777621

The physical machine deploys the non-K8S cluster, using version V21.03, and data inserts are relatively frequent.

nohup ./dgraphv20113 zero  --my 10.x.19.91:5080 --replicas 3 >> zero.log &
nohup ./dgraphv20113 alpha  --whitelist 10.x.0.0:10.5.0.0 -o 0  --ludicrous_mode  --my 10.x.19.91:7080 --zero 10.x.19.91:5080  --pending_proposals 32 -p p0 -w w0 >> nohup0.out &
nohup ./dgraphv20113 alpha  --whitelist 10.x.0.0:10.5.0.0 -o 1  --ludicrous_mode  --my 10.x.19.91:7081 --zero 10.x.19.91:5080  --pending_proposals 32 -p p1 -w w1 >> nohup1.out &
nohup ./dgraphv20113 alpha  --whitelist 10.x.0.0:10.5.0.0 -o 2  --ludicrous_mode  --my 10.x.19.91:7082 --zero 10.x.19.91:5080  --pending_proposals 32 -p p2 -w w2 >> nohup2.out &

This is the same as this and this.

The dgraph team tried to reproduce it on dgraph cloud and stated they could not.

That peer is corrupt now - to remove him and add a new peer: run /removeNode on the zero specifying the messed up peer, wipe out his state and bring him back with new state. Instructions are here (they are k8s specific but basically do as above).

This corruption issue has hit us a lot and is part of our bi-weekly issues with running dgraph ourselves. Removing nodes from a group also, in our case, caused them to stay in the raft membership and messed up our cluster which required forking dgraph to fix, but the dgraph team has never even responded to that.

thk!

@iluminae
I0719 14:06:37.313561 16173 run.go:553] HTTP server started. Listening on port 8080
I0719 14:06:37.411412 16173 pool.go:162] CONNECTING to 10.4.19.91:5080
[Sentry] 2021/07/19 14:06:37 Sending fatal event [43643ba1d1d446039d16ab8d21b973ff] to o318308.ingest.sentry.io project: 1805390
2021/07/19 14:06:37 rpc error: code = Unknown desc = REUSE_RAFTID: Duplicate Raft ID 1 to removed member: id:1 group_id:1 addr:“10.4.19.91:7080” last_update:1617349861

Do you have a specific question? If so you will have to share what you have done.

Also a little confused on your above output based on you saying:

but the commands are obviously the v20.11.x flags (and the binary is called dgraphv20113).

Making a lot of assumptions here, I would guess when you removed a peer and ran removeNode/ you may not have removed the p, w directories for that peer before restarting it. The reason I say this is the raft ID is given by the zero servers to new clean alpha servers that do not specify a raft ID of their own on the command line, or have it written down in their p/w directories.

the problem with dgraphv20113 operation staff comment is actually V21.03
Now we can start the new alpha and feel your reply very much!