Badger seeks to prefix much slower than file reads (30x), on SSD

I’m seeing very slow badger reads (78 seconds), compared to prior use of files for reads (2.2 seconds). This is on SSD. Here’s a profile:

At badgerdb version github.com/dgraph-io/badger/v2 v2.0.1-rc1.0.20200709123515-8e896a7af361

setting up badger like this:

opt := badger.DefaultOptions(bpath).WithLogger(badgerDefaultLogger)
opt.Compression = badgeroptions.None 
opt.ZSTDCompressionLevel = 0   
opt.SyncWrites = true                
opt.MaxCacheSize = 0
opt.LoadBloomsOnOpen = false // should speed up start-up time.
db, err := badger.Open(opt)

I’m wondering if there is any tuning I should be doing to get the highest possible read performance? The seeks to the right key prefix seem to take a lot of time.

@ibrahim any suggestions?

@gargan would you be able to share your data directory and a sample query that I can use to reproduce this? You can send the data directory to Ibrahim[at]dgraph.io .

Also, can you run the badger info --dir xxx command and share the output? This command runs on a inactive badger directory so you might have to stop the running application.

Great. Here is the output of the badger info command. The data to reproduce the query is about 2.3GB at the moment (won’t go through email). I can share it if you have a place I can push it to.

$ badger info --dir honeyBadger-badgerdb
Listening for /debug HTTP requests at port: 8080

[     2020-08-10T18:05:08Z] MANIFEST      574 B MA
[        3 seconds earlier] 000033.sst    72 MB L1 
[        2 seconds earlier] 000034.sst    71 MB L1 
[         1 second earlier] 000035.sst    75 MB L1 
[                      now] 000036.sst    75 MB L1 
[                      now] 000037.sst   8.2 MB L1 
[       17 minutes earlier] 000028.sst    71 MB L2 
[       17 minutes earlier] 000029.sst    71 MB L2 
[       17 minutes earlier] 000004.vlog  743 MB VL
[        5 minutes earlier] 000005.vlog  1.0 GB VL
[        3 seconds earlier] 000006.vlog  224 MB VL

[EXTRA]
[2020-08-10T17:44:48Z] KEYREGISTRY    28 B

[Summary]
Level 0 size:          0 B
Level 1 size:       301 MB
Level 2 size:       142 MB
Total index size:   443 MB
Value log size:     2.0 GB

Abnormalities:
1 extra file.
0 missing files.
0 empty files.
0 truncated manifests.
badger 2020/08/10 18:05:11 INFO: All 7 tables opened in 137ms
badger 2020/08/10 18:05:11 INFO: Replaying file id: 6 at offset: 223816271
badger 2020/08/10 18:05:11 INFO: Replay took: 18.751µs

@gargan Can you try running your queries once again? Your data might be in level 0 and the DB close forced it to level 1. The level 0 to level 1 compaction would’ve dropped all the stale keys which should improve the read performance.

I realized I wasn’t doing an even comparison; there was some hidden caching going on with the file side. So I added the same caching on the badger side and now its only about 3x slower.

Still I’m trying to make reads go as fast as possible. Would these or any other options be expected to help?

opt.KeepBlocksInCache = true           // default false
opt.KeepBlockIndicesInCache = true // default false
opt.BlockSize = 8 * 1024                    // default  4 * 1024