Database open time is really high


(Vincent Rischmann) #1

Hello,

I have a badger database (using v1.5.5) with a dataset of approximately 420GB spread across almost 1700 files.

Right now just opening the database takes between 45s and 1min and I’m wondering if there’s anything I can do to mitigate this.

Does the number of files affect the startup time in a significant way ? What other stuff should I take a look at ?


(Manish R Jain) #2

@vrischmann:

The number of files definitely makes a difference. Badger picks up an index from each SST file, which looks like it is being done serially, so it takes time.

Maybe we could look into reading these tables concurrently, and see if that makes a difference.

Could you share the output of badger info here, so I can try and replicate your setup? Also, a list of your options, unless you’re using DefaultOptions.


(Vincent Rischmann) #3

Sure, here’s the result of the badger info.

Right now my options are:

opts := badger.DefaultOptions
opts.TableLoadingMode = options.MemoryMap
opts.ValueLogLoadingMode = options.MemoryMap
opts.Dir = dataDir
opts.ValueDir = opts.Dir
opts.LevelOneSize = 1 * (1024 * 1024 * 1024)
opts.MaxTableSize = 512 * (1024 * 1024)
opts.NumCompactors = 8
opts.ValueLogFileSize = 2 * (1024 * 1024 * 1024)
opts.ValueLogMaxEntries = 10e6

But for some time I was using DefaultOptions only. I later changed LevelOneSize, MaxTableSize, ValueLogFileSize and ValueLogMaxEntries in an attempt to reduce the number of files.

If it helps, my use case here is storing log lines. Each line has a 16 bytes key (a ULID) while the value has arbitrary length.

What are the disadvantages of using bigger tables and/or value logs (if any) ?


(Manish R Jain) #4

So, I tried replicating this, thanks to @dmai, who painfully filled an external HDD with tons of data.

I’ve a branch for you: https://github.com/dgraph-io/badger/tree/mrjn/try-to-speed-open

This branch tries to make opening tables concurrent. However, in my tests with this external rotational HDD, the process was slower, because random seeks are slow. If you have an SSD, can you try with this branch and report back? @vrischmann

This branch also added logs to show how table opening is going on.


(Vincent Rischmann) #5

Thanks for working on this.

The server with the dataset runs with 12 HDD configured with a software raid 5.
I get this:

original opening time:

time="2019-01-10T14:40:23Z" level=info msg="opened database" elapsed=5m47.719094755s package=db

with your branch:

badger 2019/01/10 14:03:57 INFO: All 397 tables opened in 3m0.336s

it’s already a substantial gain in my case so that’s great.

While opening the database I’ve seen reads top at around 700mib/s but this array is capable of reading at more than 1gib/s (with a straight cat foo > /dev/null) so I’m not sure if I’m being i/o bound right now.

Here’s the graph of disk throughput, the blue line is the raid array (negative means reads).

First peak is badger on master, second is badger with your branch and last is just cat.

Note that since I opened the issue the dataset changed too, it’s bigger now. Here’s the new badger info output.


(Vincent Rischmann) #6

Forgot to mention, I profiled my app with pprof while opening so I have a pprof cpu profile available if you’d like. I’d prefer to share it privately however since I’m not sure what sort of private data is included in it.


(Daniel Mai) #7

These are two different log messages. Can you share the same first log message between the branches that shows the elapsed time?

Badger does random seeks within the files, so it’s not the same performance as a sequential read like cat.


(Vincent Rischmann) #8

Sorry, I should have specified that the first log message (which is mine) is right after the call to badger.Open.

I’ll try to get you the same logs but right now I’m having a hard time reliably opening the database due to some fatal errors from badger.


(Vincent Rischmann) #9

Unfortunately I’m blocked because of this so right now this is effectively in standby.

I’ll try to take some time to regenerate a dataset that big to test this branch specifically (and hopefully nothing will break again).


(Manish R Jain) #10

I think my latest changes improves opening performance significantly. Can you try again with my branch? mrjn/try-to-speed-open.

Weird error, not sure why it’s happening, i.e. why are tables having overlaps. Will have to look at the commit history to see if something changed recently.


(Vincent Rischmann) #11

Here are the results (it actually reports that the open is done before crashing with the error so I can still test).

with commit abf7850f90101864a798b653c9126df4da19bd9e

badger 2019/01/11 10:26:25 INFO: All 389 tables opened in 2m19.021s

with commit 421090099df7b3e3636502aeddf1e08e4249f8ea

badger 2019/01/11 10:46:30 INFO: All 389 tables opened in 1m57.773s

(Vincent Rischmann) #12

FYI, there’s a seek missing in readIndex with your latest commit otherwise it fails while flushing the memtable.

Here’s a repro:

package main

import (
        "fmt"
        "os"
        "os/signal"
        "strings"
        "syscall"

        "github.com/dgraph-io/badger"
        "github.com/sirupsen/logrus"
)

func main() {
        opts := badger.DefaultOptions
        opts.Dir = "/home/vincent/tmp/badger-bug-data"
        opts.ValueDir = opts.Dir

        db, err := badger.Open(opts)
        if err != nil {
                logrus.Fatal(err)
        }

        wb := db.NewWriteBatch()

        for i := 0; i < 100000; i++ {
                s := strings.Repeat("a", 10000)

                err := wb.Set([]byte(fmt.Sprintf("%d", i)), []byte(s), 0)
                if err != nil {
                        logrus.Fatal(err)
                }
        }

        signalCh := make(chan os.Signal)
        go signal.Notify(signalCh, syscall.SIGTERM, syscall.SIGINT)
        <-signalCh

        err = wb.Flush()
        if err != nil {
                logrus.Fatal(err)
        }
        wb.Cancel()

        err = db.Close()
        if err != nil {
                logrus.Fatal(err)
        }
}

It failed like this indefinitly:

badger 2019/01/11 14:50:14 INFO: Storing value log head: {Fid:2 Len:42 Offset:1003488890}
badger 2019/01/11 14:50:14 ERROR: Failure while flushing memtable to disk: : : EOF. Retrying...
badger 2019/01/11 14:50:15 INFO: Storing value log head: {Fid:2 Len:42 Offset:1003488890}
badger 2019/01/11 14:50:15 ERROR: Failure while flushing memtable to disk: : : EOF. Retrying...
badger 2019/01/11 14:50:16 INFO: Storing value log head: {Fid:2 Len:42 Offset:1003488890}
badger 2019/01/11 14:50:17 ERROR: Failure while flushing memtable to disk: : : EOF. Retrying...
badger 2019/01/11 14:50:18 INFO: Storing value log head: {Fid:2 Len:42 Offset:1003488890}
badger 2019/01/11 14:50:18 ERROR: Failure while flushing memtable to disk: : : EOF. Retrying...
badger 2019/01/11 14:50:19 INFO: Storing value log head: {Fid:2 Len:42 Offset:1003488890}

I fixed it with this:

index 8978c98..2fcab39 100644
--- a/table/table.go
+++ b/table/table.go
@@ -242,6 +242,11 @@ func (t *Table) readIndex() error {
                t.blockIndex = append(t.blockIndex, ko)
        }

+       _, err := t.fd.Seek(0, os.SEEK_SET)
+       if err != nil {
+               return err
+       }
+
        // Execute this index read serially, because all disks are orders of magnitude faster when read
        // serially compared to executing random reads.
        var h header

which seems to work.

I’m currently in the process of recreating a database from scratch with a big dataset, we’ll see how it goes.


(Manish R Jain) #13

Yeah, I found the seek issue too. Fixed it.

We did a lot of testing with this change on various devices: rotational HDD, slower SSD, and faster SATA SSD. We see disk utilization is pretty good with these changes.

The commit is now merged. Try it out!