Badgerdb manifest corruption issue _solution_

What version of Go are you using (go version)?

go version go1.15.14 linux/amd64

What operating system are you using?

linux

What version of Badger are you using?

7677fcb1e0c2 (the version dgraph v21.03.2 uses) → master

Does this issue reproduce with the latest master?

yes

Steps to Reproduce the issue

The program itself to reproduce is simple, just write a bunch of data to badger:

        kv, _ := badger.Open(DefaultOptions("/some/directory"))
        wb := kv.NewWriteBatch()
        for i:=0;i<1000;i++ {
                for j:=0;j<10000;j++ {
                        wb.Set([]byte(fmt.Sprintf("wakkawakka%d%d",i,j)),[]byte(fmt.Sprintf("value%d",j)))
                }
        }
        wb.Flush()

However, the reproduction of the error requires syscall failure in the os.File.Sync() on the last line of manifestFile.addChanges(). A single error from the Sync() call will put the server in a permanent error state that it cannot recover from. To test this I patched a temporary error into manifest.go:

diff --git a/manifest.go b/manifest.go
index 7520456..98fa63a 100644
--- a/manifest.go
+++ b/manifest.go
@@ -239,9 +240,20 @@ func (mf *manifestFile) addChanges(changesParam []*pb.ManifestChange) error {
        }

        mf.appendLock.Unlock()
+       {
+               c := make([]uintptr, 10)
+               runtime.Callers(2, c)
+               f, _ := runtime.CallersFrames(c).Next()
+               if f.Function == "github.com/dgraph-io/badger/v3.(*levelsController).runCompactDef"&&!oopsed {
+                       oopsed=true
+                       return errors.New("WONKY")
+               }
+       }
        return mf.fp.Sync()
 }

+var oopsed=false

yea its… very specific. But you get it.

I do not understand why that file is being closed in the first place, but I have plenty of evidence from many dgraph v21.03.2 corruptions that this does really happen.

My theory: manifestFile.rewrite() closes the manifest file, rewrites it, and opens the new file, and is causing the error to come though temporarily. Note here in the golang source that ErrFileClosing is rewritten to ErrClosed, this could be what is bubbling up (this wrapErr gets called in the posix implementation of Sync/FSync).

The answer is probably to run the in-memory alteration (applyChanges()) after a successful Sync(). Obviously you guys want these two things to happen atomically… but I don’t think we can do that.

What Badger options were set?

badger.DefaultOptions("/some/disk/location")

What did you do?

patched manifest.go to run Sync() before the in-memory changes

diff --git a/manifest.go b/manifest.go
index 7520456..3fbc30d 100644
--- a/manifest.go
+++ b/manifest.go
@@ -212,10 +213,6 @@ func (mf *manifestFile) addChanges(changesParam []*pb.ManifestChange) error {
        }
        // Maybe we could use O_APPEND instead (on certain file systems)
        mf.appendLock.Lock()
-       if err := applyChangeSet(&mf.manifest, &changes); err != nil {
-               mf.appendLock.Unlock()
-               return err
-       }
        if mf.inMemory {
                mf.appendLock.Unlock()
                return nil
@@ -239,9 +236,14 @@ func (mf *manifestFile) addChanges(changesParam []*pb.ManifestChange) error {
        }

        mf.appendLock.Unlock()
-       return mf.fp.Sync()
+       if err := mf.fp.Sync(); err != nil {
+               return err
+       }
+       return applyChangeSet(&mf.manifest, &changes)
 }

What did you expect to see?

during other calls to addChanges, retries are done:

 ERROR: Failure while flushing memtable to disk: sync p/MANIFEST: file already closed. Retrying...

What did you see instead?

in the doCompact() call, it does not retry - permanent failure corrupts the manifest for all time:

WARNING: [Compactor: 2] LOG Compact FAILED with error: sync p/MANIFEST: file already closed
WARNING: While running doCompact: sync p/MANIFEST: file already closed
WARNING: [Compactor: 2] LOG Compact FAILED with error: MANIFEST removes non-existing table 1....
WARNING: While running doCompact: MANIFEST removes non-existing table 1....
1 Like

for reference, these are the issues I am talking about that this would fix:

and forgot to include the log based proof of the error happening:

Hey just want to bump this with some more conversation. This changeset reproduces the problem well:

diff --git a/manifest.go b/manifest.go
index 7520456..ee8e6f2 100644
--- a/manifest.go
+++ b/manifest.go
@@ -239,9 +239,11 @@ func (mf *manifestFile) addChanges(changesParam []*pb.ManifestChange) error {
        }

        mf.appendLock.Unlock()
-       return mf.fp.Sync()
+       return syncfunc(mf.fp)
 }

+var syncfunc = func(f *os.File) error { return f.Sync() }
+
 // Has to be 4 bytes.  The value can never change, ever, anyway.
 var magicText = [4]byte{'B', 'd', 'g', 'r'}

diff --git a/manifest_test.go b/manifest_test.go
index 05814bc..15ab7b6 100644
--- a/manifest_test.go
+++ b/manifest_test.go
@@ -18,13 +18,16 @@ package badger

 import (
        "context"
+       "errors"
        "fmt"
        "io/ioutil"
        "math/rand"
+       "runtime"
        "os"
        "path/filepath"
        "sort"
        "testing"
+       "time"

        otrace "go.opencensus.io/trace"

@@ -245,3 +248,31 @@ func TestManifestRewrite(t *testing.T) {
                uint64(deletionsThreshold * 3): {Level: 0},
        }, m.Tables)
 }
+
+func TestCorruption(t *testing.T) {
+       kv, err := Open(DefaultOptions(t.TempDir()))
+       require.NoError(t, err)
+       wb := kv.NewWriteBatch()
+       i := 0
+       syncfunc = func(f *os.File) error {
+               c := make([]uintptr, 10)
+               runtime.Callers(3, c)
+               fr, _ := runtime.CallersFrames(c).Next()
+               if fr.Function == "github.com/dgraph-io/badger/v3.(*levelsController).runCompactDef" {
+                       i++
+                       t.Logf("called runCompactDef()->addChanges()->Sync() %d times", i)
+                       if i == 10 {
+                               i = 0
+                               return errors.New("WAKKA")
+                       }
+               }
+               return f.Sync()
+       }
+       for i := 0; i < 1000; i++ {
+               for j := 0; j < 100000; j++ {
+                       wb.Set([]byte(fmt.Sprintf("wakkawakka%d%d", i, j)), []byte(fmt.Sprintf("value%d", j)))
+               }
+       }
+       wb.Flush()
+       time.Sleep(10 * time.Second)
+}

As soon as a error is encountered in addChanges() from runCompactDef(), it will get in a infinite loop of delete changes that error in applyChangeSet(): MANIFEST removes non-existing table. This is the issue we have encountered so much in dgraph.

To summarize the issue: Sync() may fail, and changes are done to the memory manifest file before Sync() runs, which are not backed out on failure and cause a bad state.

So, a change that could fix this is to move the Sync() to before applyChangeSet(), and both within the appendLock mutex. I know that right now the lock is released before running Sync() on purpose to unblock this a bit faster - however since this is causing real world issues I think moving it to within the lock may be the only safe option there.

I can make a PR for this if you guys need me to and this is the way you want to handle it, but I will need to get approval from our legal department first to commit to badger. So, it would be easier for you guys to commit. (I am starting the legal process anyway just in case)

@Naman @aman-bansal you two have the most recent commits in badger so pinging you here.

actually… saying it aloud this morning has made me realize what is probably causing the Sync() failure.

Since Sync() is outside the lock, two goroutines may be calling addChanges() at once and the second will block on appendLock. However, if the lock is removed and one routine calls rewrite() before the syscall Sync()->syscall.Fsync() runs, then the ‘file is closing’ error would be found.

Ok, I got approval to submit the code - bumping this with the link to the PR: https://github.com/dgraph-io/badger/pull/1756

I split it into 2 commits if you guys do not want the unit test for the race condition in there forever.

Can someone respond and look at the PR?

1 Like

bump…

Thanks a lot @iluminae for raising the fix for the issue. We really appreciate your contribution. This is a holiday week for some folks (based out of India) in Dgraph, thats why the response got delayed for the PR.

I will look into this issue.

oh its diwali got it - I was trying to figure out if there was some reason for the delay.

Ok, thats fine - just wanted an ACK. I really hope we can get this into the next version of dgraph so we can stop seeing this corruption on IO-Throttled systems.

Thank you for merging my PR @Naman - As far as the plan for inclusion, did you want a cherry pick onto some other badger branch for release with a v21.03.X patch? Or will it only be included with v21.next? In which case, when are you planning v21.next’s release?

We are being really messed up by all these manifest corruptions and would like to run this code, but if it will not be released with v21.03 and v21.next is far out we may just spin our own image with this fix.

The next release would be v21.12. We’re trying to test it out this week, and hopefully cut a release next week.

The new release has a whole bunch of performance optimizations. So, excited about that.

2 Likes

How goes it? We need this code, if there is not going to be a release this week I need to backport to v21.03.2 and run that.