Speeding up insert-or-update

opennota · October 14, 2022, 9:16am

I’m currently trying to insert ~1.5 billion key/value pairs into a badger database on an SSD disk. The keys aren’t unique. What I need is, roughly speaking, to increment a counter in the serialized struct whenever a key is encountered, so every operation is a Get followed by Set. There’s other data besides the counter in the struct that remains unchanged after the first Set. What should I know or do to get the most out of badger performance?

vnium · October 14, 2022, 3:25pm

After the first Set cache the key+counter in memory. Increase only the cached counter. After you are done ingesting do a final Set per cached pair. In case you run out of memory, only cache it partially.

opennota · October 15, 2022, 9:09am

Badger already has block and index caches. What’s the point in reinventing the wheel while trying to avoid OOM?

opennota · October 15, 2022, 9:45am

[Summary]
Level 0 size:       57 MiB
Level 1 size:          0 B
Level 2 size:          0 B
Level 3 size:          0 B
Level 4 size:          0 B
Level 5 size:      1.2 GiB
Level 6 size:       12 GiB
Total SST size:     13 GiB
Value log size:    2.0 GiB

Is that OK that the intermediate levels are empty? The keys are still being inserted.

By the way, it’s crazy IMO that you can’t set some keys (the ones with the “!badger!” prefix).

vnium · October 15, 2022, 10:25am

I suspect the lookup via the indexes is not the slow part. You can profile your application to find out.
Disk I/O is slow. Every update to the badger db is persisted on disk. Caching the counters in memory and only writing them to the db once you are finished will be faster.

opennota · October 16, 2022, 4:11am

So the whole process took 1 day 6 hours 12 minutes 29 seconds. That’s 72 μs/op (op=Get+Set). Not bad. Maybe with additional caching I could shave these 6h off.

opennota · October 17, 2022, 6:11am

I disabled swap, and badger started to get OOM-killed on me after eating 85+% of 24Gb RAM. And that’s after inserting only about 70 million k/v pairs. The key size is 8 bytes, the value size if 18 bytes.

opennota · October 18, 2022, 2:32pm

Well, with an in-memory 1Gb cache suggested by vnium the data ingestion took a bit less than 24 hours, just as I had predicted.

Topic		Replies	Views
Figuring optimal options for usage pattern App Development badger	2	619	January 12, 2021
Badger seeks to prefix much slower than file reads (30x), on SSD Badger	5	883	August 12, 2020
Performance issue in prefix iteration Badger kind:question	0	847	March 17, 2022
Badger performance and memory usage recommendations Badger	9	5108	June 27, 2020
Badger key-only scanning is a bit slow than expected Badger kind:question	1	601	September 7, 2021

Speeding up insert-or-update

Related topics