BadgerDB is generating new VLOG at every restart

christian-roggia · September 21, 2020, 9:41pm

Hello everyone, I am currently using GitHub - mosuka/cete: Cete is a distributed key value store server written in Go built on top of BadgerDB. as key-value storage for our configuration which relies on badger db for storing information.

We have noticed that at every restart the number of vlog files increases even if no writes were performed prior to the restart. The restart might be the result of a forced shutdown (preemtible nodes restarting) or a kubernetes probe killing the process.

The resulting situation is that if the cete/badger process does not come back online fast enough the kubernetes scheduler will eventually kill the process and restart it. At the next restart the database grows even bigger, slowing down the startup even further, this cycle goes on until no space on disk is left.

NOTE: We have RunValueLogGC running periodically to cleanup old vlog files, but the garbage collector is not able to keep up with this loop.

My questions would be:

Why new vlog are generated at every startup?
Is it because the badger instance was killed / crashed?
How can we avoid the database from entering this loop without disabling probes?

As an additional information, badger is running with DefaultOptions and SyncWrites disabled.

christian-roggia · September 21, 2020, 10:58pm

Turned out the Raft implementation in cete replays the snapshot with all key-value pairs at every restart, causing badger to generate new vlog files for basically the entire dataset, I will move this discussion to cete as it is not strictly related to badger itself.

I would still like to ask if there is any suggestion on how this scenario could be avoided.

ibrahim · September 22, 2020, 11:48am

Hey @christian-roggia

We have noticed that at every restart the number of vlog files increases even if no writes were performed prior to the restart. The restart might be the result of a forced shutdown (preemtible nodes restarting) or a kubernetes probe killing the process.

Badger would replay data on crash but that’s not supposed to create vlog file. I ran a few tests by killing badger locally and I dont see any new vlog files.

Are you using the latest released version of badger? There were two fixes related to replay. If badger would crash during replay, it would replay the entire database on start. These were fixed by

github.com/dgraph-io/badger

Update head while replaying

dgraph-io:master ← dgraph-io:ibrahim/replay-head

opened 03:11PM - 16 Jun 20 UTC

jarifibrahim

+16 -7

Fixes https://github.com/dgraph-io/badger/issues/1363 The head pointer was no…t being updated when we perform replays. The head pointer would be updated only when the replay completes. If badger crashed between the point when replay started and replay finished, we would end up replaying all the value log files. This PR fixes this issue. --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/dgraph-io/badger/1372)

and

github.com/dgraph-io/badger

Fix replay for LSM entries

dgraph-io:master ← dgraph-io:ibrahim/replay-fix

opened 02:36PM - 13 Aug 20 UTC

jarifibrahim

+52 -12

https://github.com/dgraph-io/badger/pull/1372 tried to fix the `replay from star…t` issue but it partially fixed the issue. The head was not being updated in case all the entries are inserted only in the LSM tree. This Pr fixes it --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/dgraph-io/badger/1456)

@christian-roggia which version of badger are you using?

Yes, we are aware of it and that’s why we’re working on separating the write-ahead log with the vlog files. We’re working on it

github.com/dgraph-io/badger

Use WAL instead of Vlog for crash recovery.

dgraph-io:master ← dgraph-io:naman/wal

opened 07:25PM - 21 Sep 20 UTC

NamanJain8

+7308 -323

Fixes DGRAPH-2177 This PR separates smaller values from vlog and instead stor…es it in WAL. This leads to minimization of disk usage as WAL files can be easily cleaned. Smaller values ( <ValueThreshold ): WAL -> LSM Bigger values : WAL -> Vlog -> LSM --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/dgraph-io/badger/1535)

They are not. Try running this program Go Playground - The Go Programming Language and run watch -n 0.1 "ls -lh ./data" you would see that badger is not creating any new vlog files.

Yes, this could be the reason. I recommend using the latest patch release of badger.

Use the latest version of badger. The older version would go into loop if it crashes on replay.

@christian-roggia please do try the latest badger release and your replay problem should be fixed.

Just out of curiosity, can you show me the logs? I’m looking for the logs which say Replaying file id ... . If you can show me the logs across crashes, that would be perfect and it will help in confirming my suspicion.

christian-roggia · September 22, 2020, 2:29pm

Hello @ibrahim, thank you a lot for the very extensive response! As mentioned in my previous response, we isolated the issue as a design flaw related to how Raft and BadgerDB are implemented in mosuka/cete and therefore this issue does not concern badger.

We will upgrade badger to the latest version as it is a good practice to stay up-to-date, but I believe some serious design changes need to be done in regards to cete.

I’ll link the GitHub issue here as it might be useful to anyone who is facing the same issue: Design flaw in the Raft <-> BadgerDB implementation · Issue #49 · mosuka/cete · GitHub

Thank you again for the insight, I will keep that in mind while proposing a new architecture.

Topic		Replies	Views
QUESTION: Best way to create snapshots from badger Badger kind:question	1	479	December 9, 2020
Duplicated Rows? Badger	6	878	January 10, 2019
Why should we keep all verisons and how to reduce vlog growing speed Badger	2	959	May 16, 2020
How to access vlog files of badger Badger	3	1374	April 11, 2019
Looking fo ValueLog GC Insight for Interesting Badger Use Case Badger	6	1262	May 14, 2020

BadgerDB is generating new VLOG at every restart

Related topics