Badger seeks to prefix much slower than file reads (30x), on SSD

gargan · August 7, 2020, 6:35pm

I’m seeing very slow badger reads (78 seconds), compared to prior use of files for reads (2.2 seconds). This is on SSD. Here’s a profile:

At badgerdb version github.com/dgraph-io/badger/v2 v2.0.1-rc1.0.20200709123515-8e896a7af361

setting up badger like this:

opt := badger.DefaultOptions(bpath).WithLogger(badgerDefaultLogger)
opt.Compression = badgeroptions.None 
opt.ZSTDCompressionLevel = 0   
opt.SyncWrites = true                
opt.MaxCacheSize = 0
opt.LoadBloomsOnOpen = false // should speed up start-up time.
db, err := badger.Open(opt)

I’m wondering if there is any tuning I should be doing to get the highest possible read performance? The seeks to the right key prefix seem to take a lot of time.

Neeraj · August 9, 2020, 3:51am

@ibrahim any suggestions?

ibrahim · August 9, 2020, 12:24pm

@gargan would you be able to share your data directory and a sample query that I can use to reproduce this? You can send the data directory to Ibrahim[at]dgraph.io .

Also, can you run the badger info --dir xxx command and share the output? This command runs on a inactive badger directory so you might have to stop the running application.

gargan · August 10, 2020, 6:07pm

Great. Here is the output of the badger info command. The data to reproduce the query is about 2.3GB at the moment (won’t go through email). I can share it if you have a place I can push it to.

$ badger info --dir honeyBadger-badgerdb
Listening for /debug HTTP requests at port: 8080

[     2020-08-10T18:05:08Z] MANIFEST      574 B MA
[        3 seconds earlier] 000033.sst    72 MB L1 
[        2 seconds earlier] 000034.sst    71 MB L1 
[         1 second earlier] 000035.sst    75 MB L1 
[                      now] 000036.sst    75 MB L1 
[                      now] 000037.sst   8.2 MB L1 
[       17 minutes earlier] 000028.sst    71 MB L2 
[       17 minutes earlier] 000029.sst    71 MB L2 
[       17 minutes earlier] 000004.vlog  743 MB VL
[        5 minutes earlier] 000005.vlog  1.0 GB VL
[        3 seconds earlier] 000006.vlog  224 MB VL

[EXTRA]
[2020-08-10T17:44:48Z] KEYREGISTRY    28 B

[Summary]
Level 0 size:          0 B
Level 1 size:       301 MB
Level 2 size:       142 MB
Total index size:   443 MB
Value log size:     2.0 GB

Abnormalities:
1 extra file.
0 missing files.
0 empty files.
0 truncated manifests.
badger 2020/08/10 18:05:11 INFO: All 7 tables opened in 137ms
badger 2020/08/10 18:05:11 INFO: Replaying file id: 6 at offset: 223816271
badger 2020/08/10 18:05:11 INFO: Replay took: 18.751µs

ibrahim · August 11, 2020, 11:27am

@gargan Can you try running your queries once again? Your data might be in level 0 and the DB close forced it to level 1. The level 0 to level 1 compaction would’ve dropped all the stale keys which should improve the read performance.

gargan · August 12, 2020, 12:04am

I realized I wasn’t doing an even comparison; there was some hidden caching going on with the file side. So I added the same caching on the badger side and now its only about 3x slower.

Still I’m trying to make reads go as fast as possible. Would these or any other options be expected to help?

opt.KeepBlocksInCache = true           // default false
opt.KeepBlockIndicesInCache = true // default false
opt.BlockSize = 8 * 1024                    // default  4 * 1024

Topic		Replies	Views
Figuring optimal options for usage pattern App Development badger	2	618	January 12, 2021
Speeding up insert-or-update Badger	7	984	October 18, 2022
Badger key-only scanning is a bit slow than expected Badger kind:question	1	601	September 7, 2021
Badger write performance on 1 TB data Badger	4	2695	March 15, 2022
Database open time is really high Badger	12	1696	January 12, 2019

Badger seeks to prefix much slower than file reads (30x), on SSD

Related topics