I’m trying to implement persistence for RAFT logs. We run many RAFT groups per server, and so we want to have one WAL to handle all the groups. The logic would be that we still use the Memory Store based WAL that you guys provide, but have the persistent one to bring the memory ones (one for each group) up to sync on a restart.
So, the main questions are:
- When we take a snapshot, do we need to transmit them to the followers? In other words, do snapshots need to be synced across, so the RAFT logs between leader and followers are exactly the same. Otherwise, each member of the group can just snapshot on its own and there’s no need for communication.
- How often do we need to sync to disk? I reckon every time right? To avoid the case where the server crashes and restarts, and then we aren’t able to bring the memory state to it’s last recorded position by the leader.
- Say we’re replaying the logs from the persistent store into memory. If we encounter a snapshot entry, we can just discard all the previous entries for the memory store, right? This would help keep the memory usage low.
- Our snapshots don’t really contain any data, they’re just a way for us to discard RAFT entries. We noticed that you guys use a snapshotter which seems to do something similar, just storing the Index and Term. Do you store snapshots separately from the logs? Do you only replay logs from the index and term specified in the snapshots? Would that make sense to do?
In general, any suggestions that you have for us to implement a persistent storage for RAFT logs, that’d be very useful.