Did I lose my data for forever?

iluminae · June 14, 2021, 2:18pm

I filed this issue on friday to no answer that was (unknowingly) a duplicate of this.

I completely rebuilt my production system on friday night to get around this (I have 3 shards so only one in that group is borked.) This just happened again on the new system. No restarts, brand new system as of saturday night - built from bulk loader, if that is interesting.

One of the peers just gives up with this MANIFEST removes non-existing table X and a restart will make it crashloop with the panic in the linked issue.

Here is the raft applied index of that group, you can see when the one encountered this error:

note: unlike OP I have 12 alphas, but it seems this could happen at any time. Each alpha is the only pod on a GKE VM. I have been running dgraph for over a year and have never encountered this until upgrade to v21.03.

Also see here, the memory usage of the node with this error increases where the others are low.

@ibrahim, please - there is some critical issue in badger it seems. If another peer in the same group corrupts thats probably full data loss without any further guidance.

Topic		Replies	Views
Log compaction dropping some data Dgraph dgraph , kind:bug	1	731	March 7, 2022
Dgraph memory issues Dgraph	0	545	February 20, 2022
Dgraph live load crashing after few min Dgraph	11	645	March 23, 2021
Some data lost Dgraph dgraph	1	400	November 18, 2021
Dgraph bulk load panics due to buffer size exceeded Dgraph	6	682	December 21, 2021

Did I lose my data for forever?

Related Topics