Critical bug in v21.12 permanently crashloops whole groups

I removed the node that was damaged by the last panic (alpha-10) and re-added him with completely fresh storage. Here is what happened:

  • crashloops
  • removeNode
  • new node comes up, gets assigned to free spot in that group
  • gets snapshot from leader
  • after snapshot, gets txns sent to him from snapshotTS->now
  • panics at same timestamp as before

see cutely annotated screenshot of this in grafana:


(update: and a second removeNode and re-add done after the next snapshot worked as expected, but this highlights that whatever this issue is, is committed to the raft wal.)

1 Like