Database open time is really high

vrischmann · January 2, 2019, 8:37pm

Hello,

I have a badger database (using v1.5.5) with a dataset of approximately 420GB spread across almost 1700 files.

Right now just opening the database takes between 45s and 1min and I’m wondering if there’s anything I can do to mitigate this.

Does the number of files affect the startup time in a significant way ? What other stuff should I take a look at ?

mrjn · January 3, 2019, 12:27am

@vrischmann:

The number of files definitely makes a difference. Badger picks up an index from each SST file, which looks like it is being done serially, so it takes time.

github.com

dgraph-io/badger/blob/a7db8195d90af472130f03e46a00ba0b5b9c2cf0/levels.go#L106


	}

	// Compare manifest against directory, check for existent/non-existent files, and remove.
	if err := revertToManifest(kv, mf, getIDMap(kv.opt.Dir)); err != nil {
		return nil, err
	}

	// Some files may be deleted. Let's reload.
	tables := make([][]*table.Table, kv.opt.MaxLevels)
	var maxFileID uint64
	for fileID, tableManifest := range mf.Tables {
		fname := table.NewFilename(fileID, kv.opt.Dir)
		var flags uint32 = y.Sync
		if kv.opt.ReadOnly {
			flags |= y.ReadOnly
		}
		fd, err := y.OpenExistingFile(fname, flags)
		if err != nil {
			closeAllTables(tables)
			return nil, errors.Wrapf(err, "Opening file: %q", fname)
		}

Maybe we could look into reading these tables concurrently, and see if that makes a difference.

Could you share the output of badger info here, so I can try and replicate your setup? Also, a list of your options, unless you’re using DefaultOptions.

vrischmann · January 3, 2019, 11:12am

Sure, here’s the result of the badger info.

Right now my options are:

opts := badger.DefaultOptions
opts.TableLoadingMode = options.MemoryMap
opts.ValueLogLoadingMode = options.MemoryMap
opts.Dir = dataDir
opts.ValueDir = opts.Dir
opts.LevelOneSize = 1 * (1024 * 1024 * 1024)
opts.MaxTableSize = 512 * (1024 * 1024)
opts.NumCompactors = 8
opts.ValueLogFileSize = 2 * (1024 * 1024 * 1024)
opts.ValueLogMaxEntries = 10e6

But for some time I was using DefaultOptions only. I later changed LevelOneSize, MaxTableSize, ValueLogFileSize and ValueLogMaxEntries in an attempt to reduce the number of files.

If it helps, my use case here is storing log lines. Each line has a 16 bytes key (a ULID) while the value has arbitrary length.

What are the disadvantages of using bigger tables and/or value logs (if any) ?

mrjn · January 10, 2019, 1:55pm

So, I tried replicating this, thanks to @dmai, who painfully filled an external HDD with tons of data.

I’ve a branch for you: https://github.com/dgraph-io/badger/tree/mrjn/try-to-speed-open

This branch tries to make opening tables concurrent. However, in my tests with this external rotational HDD, the process was slower, because random seeks are slow. If you have an SSD, can you try with this branch and report back? @vrischmann

This branch also added logs to show how table opening is going on.

vrischmann · January 10, 2019, 3:10pm

Thanks for working on this.

The server with the dataset runs with 12 HDD configured with a software raid 5.
I get this:

original opening time:

time="2019-01-10T14:40:23Z" level=info msg="opened database" elapsed=5m47.719094755s package=db

with your branch:

badger 2019/01/10 14:03:57 INFO: All 397 tables opened in 3m0.336s

it’s already a substantial gain in my case so that’s great.

While opening the database I’ve seen reads top at around 700mib/s but this array is capable of reading at more than 1gib/s (with a straight cat foo > /dev/null) so I’m not sure if I’m being i/o bound right now.

Here’s the graph of disk throughput, the blue line is the raid array (negative means reads).

First peak is badger on master, second is badger with your branch and last is just cat.

Note that since I opened the issue the dataset changed too, it’s bigger now. Here’s the new badger info output.

vrischmann · January 10, 2019, 3:12pm

Forgot to mention, I profiled my app with pprof while opening so I have a pprof cpu profile available if you’d like. I’d prefer to share it privately however since I’m not sure what sort of private data is included in it.

dmai · January 10, 2019, 7:44pm

vrischmann:

original opening time:

time="2019-01-10T14:40:23Z" level=info msg="opened database" elapsed=5m47.719094755s package=db

with your branch:

badger 2019/01/10 14:03:57 INFO: All 397 tables opened in 3m0.336s

These are two different log messages. Can you share the same first log message between the branches that shows the elapsed time?

Badger does random seeks within the files, so it’s not the same performance as a sequential read like cat.

vrischmann · January 10, 2019, 8:02pm

Sorry, I should have specified that the first log message (which is mine) is right after the call to badger.Open.

I’ll try to get you the same logs but right now I’m having a hard time reliably opening the database due to some fatal errors from badger.

vrischmann · January 10, 2019, 10:13pm

Unfortunately I’m blocked because of this so right now this is effectively in standby.

I’ll try to take some time to regenerate a dataset that big to test this branch specifically (and hopefully nothing will break again).

mrjn · January 11, 2019, 5:57am

I think my latest changes improves opening performance significantly. Can you try again with my branch? mrjn/try-to-speed-open.

Weird error, not sure why it’s happening, i.e. why are tables having overlaps. Will have to look at the commit history to see if something changed recently.

vrischmann · January 11, 2019, 10:56am

Here are the results (it actually reports that the open is done before crashing with the error so I can still test).

with commit abf7850f90101864a798b653c9126df4da19bd9e

badger 2019/01/11 10:26:25 INFO: All 389 tables opened in 2m19.021s

with commit 421090099df7b3e3636502aeddf1e08e4249f8ea

badger 2019/01/11 10:46:30 INFO: All 389 tables opened in 1m57.773s

vrischmann · January 11, 2019, 2:00pm

FYI, there’s a seek missing in readIndex with your latest commit otherwise it fails while flushing the memtable.

Here’s a repro:

package main

import (
        "fmt"
        "os"
        "os/signal"
        "strings"
        "syscall"

        "github.com/dgraph-io/badger"
        "github.com/sirupsen/logrus"
)

func main() {
        opts := badger.DefaultOptions
        opts.Dir = "/home/vincent/tmp/badger-bug-data"
        opts.ValueDir = opts.Dir

        db, err := badger.Open(opts)
        if err != nil {
                logrus.Fatal(err)
        }

        wb := db.NewWriteBatch()

        for i := 0; i < 100000; i++ {
                s := strings.Repeat("a", 10000)

                err := wb.Set([]byte(fmt.Sprintf("%d", i)), []byte(s), 0)
                if err != nil {
                        logrus.Fatal(err)
                }
        }

        signalCh := make(chan os.Signal)
        go signal.Notify(signalCh, syscall.SIGTERM, syscall.SIGINT)
        <-signalCh

        err = wb.Flush()
        if err != nil {
                logrus.Fatal(err)
        }
        wb.Cancel()

        err = db.Close()
        if err != nil {
                logrus.Fatal(err)
        }
}

It failed like this indefinitly:

badger 2019/01/11 14:50:14 INFO: Storing value log head: {Fid:2 Len:42 Offset:1003488890}
badger 2019/01/11 14:50:14 ERROR: Failure while flushing memtable to disk: : : EOF. Retrying...
badger 2019/01/11 14:50:15 INFO: Storing value log head: {Fid:2 Len:42 Offset:1003488890}
badger 2019/01/11 14:50:15 ERROR: Failure while flushing memtable to disk: : : EOF. Retrying...
badger 2019/01/11 14:50:16 INFO: Storing value log head: {Fid:2 Len:42 Offset:1003488890}
badger 2019/01/11 14:50:17 ERROR: Failure while flushing memtable to disk: : : EOF. Retrying...
badger 2019/01/11 14:50:18 INFO: Storing value log head: {Fid:2 Len:42 Offset:1003488890}
badger 2019/01/11 14:50:18 ERROR: Failure while flushing memtable to disk: : : EOF. Retrying...
badger 2019/01/11 14:50:19 INFO: Storing value log head: {Fid:2 Len:42 Offset:1003488890}

I fixed it with this:

index 8978c98..2fcab39 100644
--- a/table/table.go
+++ b/table/table.go
@@ -242,6 +242,11 @@ func (t *Table) readIndex() error {
                t.blockIndex = append(t.blockIndex, ko)
        }

+       _, err := t.fd.Seek(0, os.SEEK_SET)
+       if err != nil {
+               return err
+       }
+
        // Execute this index read serially, because all disks are orders of magnitude faster when read
        // serially compared to executing random reads.
        var h header

which seems to work.

I’m currently in the process of recreating a database from scratch with a big dataset, we’ll see how it goes.

mrjn · January 12, 2019, 9:49pm

Yeah, I found the seek issue too. Fixed it.

We did a lot of testing with this change on various devices: rotational HDD, slower SSD, and faster SATA SSD. We see disk utilization is pretty good with these changes.

The commit is now merged. Try it out!

https://github.com/dgraph-io/badger/commit/bb1d12feefcaa7102ead58a53a91e4ae47ae5453

Topic		Replies	Views
Badger seeks to prefix much slower than file reads (30x), on SSD Badger	5	876	August 12, 2020
Speeding up insert-or-update Badger	7	976	October 18, 2022
Jaeger can not start, Badger deleting file very slow, jaeger can not init storage Badger kind:question	7	1506	November 30, 2021
Figuring optimal options for usage pattern App Development badger	2	617	January 12, 2021
Badger getting slower as table gets bigger Badger	0	529	November 18, 2021

Database open time is really high

Related Topics