BadgerDB is generating new VLOG at every restart

Hello everyone, I am currently using GitHub - mosuka/cete: Cete is a distributed key value store server written in Go built on top of BadgerDB. as key-value storage for our configuration which relies on badger db for storing information.

We have noticed that at every restart the number of vlog files increases even if no writes were performed prior to the restart. The restart might be the result of a forced shutdown (preemtible nodes restarting) or a kubernetes probe killing the process.

The resulting situation is that if the cete/badger process does not come back online fast enough the kubernetes scheduler will eventually kill the process and restart it. At the next restart the database grows even bigger, slowing down the startup even further, this cycle goes on until no space on disk is left.

NOTE: We have RunValueLogGC running periodically to cleanup old vlog files, but the garbage collector is not able to keep up with this loop.

My questions would be:

  • Why new vlog are generated at every startup?
  • Is it because the badger instance was killed / crashed?
  • How can we avoid the database from entering this loop without disabling probes?

As an additional information, badger is running with DefaultOptions and SyncWrites disabled.

Turned out the Raft implementation in cete replays the snapshot with all key-value pairs at every restart, causing badger to generate new vlog files for basically the entire dataset, I will move this discussion to cete as it is not strictly related to badger itself.

I would still like to ask if there is any suggestion on how this scenario could be avoided.

Hey @christian-roggia

We have noticed that at every restart the number of vlog files increases even if no writes were performed prior to the restart. The restart might be the result of a forced shutdown (preemtible nodes restarting) or a kubernetes probe killing the process.

Badger would replay data on crash but that’s not supposed to create vlog file. I ran a few tests by killing badger locally and I dont see any new vlog files.

Are you using the latest released version of badger? There were two fixes related to replay. If badger would crash during replay, it would replay the entire database on start. These were fixed by

and

@christian-roggia which version of badger are you using?

Yes, we are aware of it and that’s why we’re working on separating the write-ahead log with the vlog files. We’re working on it

They are not. Try running this program Go Playground - The Go Programming Language and run watch -n 0.1 "ls -lh ./data" you would see that badger is not creating any new vlog files.

Yes, this could be the reason. I recommend using the latest patch release of badger.

Use the latest version of badger. The older version would go into loop if it crashes on replay.

@christian-roggia please do try the latest badger release and your replay problem should be fixed.

Just out of curiosity, can you show me the logs? I’m looking for the logs which say Replaying file id ... . If you can show me the logs across crashes, that would be perfect and it will help in confirming my suspicion.

1 Like

Hello @ibrahim, thank you a lot for the very extensive response! As mentioned in my previous response, we isolated the issue as a design flaw related to how Raft and BadgerDB are implemented in mosuka/cete and therefore this issue does not concern badger.

We will upgrade badger to the latest version as it is a good practice to stay up-to-date, but I believe some serious design changes need to be done in regards to cete.

I’ll link the GitHub issue here as it might be useful to anyone who is facing the same issue: Design flaw in the Raft <-> BadgerDB implementation · Issue #49 · mosuka/cete · GitHub

Thank you again for the insight, I will keep that in mind while proposing a new architecture.

1 Like