Some good strategies to implement

diggy · June 28, 2019, 7:11pm

Moved from GitHub badger/909

Posted by n-ozerov:

Recenty, i’ve found report about MongoDB engine based on RocksDB: https://pgday.ru/presentation/178/5964963da443b.pdf

Google Translated version is here: 5964963da443b.ru.en.pdf - Google Drive

There were mentioned two good things – how to reclaim space and column families:

Deletion:

Collecting stats about tombstones during iteration on range.
If the amount exceeded threshold, start compaction for this range.

This could be helpful for scenarios with many (or batch range) deletion (queues, etc).

Also it has function DeleteFilesInRange which allows to delete files if all keys in specified range.

Column families allows to use many LSM trees which share same WAL to allow making transactions. Which could be good for some operation like deleting whole column by deleting whole tree, it also should take less space than using prefixes.

Sorry for my bad English. I’ll try to explain again if you didn’t understand me.

diggy · June 28, 2019, 8:57pm

campoy commented :

Hi there @n-ozerov,

The second link is broken, and unfortunately, I do not speak Russian.
Could you provide more info about the features?

Thanks

diggy · June 28, 2019, 9:14pm

n-ozerov commented :

Updated link. I’ve just found that many of text\images rely on Compaction · facebook/rocksdb Wiki · GitHub
Manual Compaction · facebook/rocksdb Wiki · GitHub
Delete A Range Of Keys · facebook/rocksdb Wiki · GitHub
DeleteRange: A New Native RocksDB Operation | RocksDB
Implement Queue Service Using RocksDB · facebook/rocksdb Wiki · GitHub (CompactOnDeletionCollector)
Column Families · facebook/rocksdb Wiki · GitHub

The most interesting pages are 44-64, 75-87.

"Request cursor statistics after range iteration. If the amount of missing data has exceeded threshold to run compaction for a given range.

DeleteFilesInRange, allows you to delete files keys in which are completely within the specified range"

“In the classic engines for each container (data or index) there is one tree. In MongoRocks, one LSM tree for all” – picture on page 80

“RocksDB supports many LSM trees (column families) united by journal WAL to ensure transactional. Originally made and applied with the expectation of Mysql. MongoRocks should have one LSM tree for one prefix”

I’ve found a few feature requests where people asked about column families and got rejected, so it would be better to focus on LSM compaction.

As far as i understood, if iterator notices huge range of tombstones it forces compactions in this range. Is it possible to implement in Badger?

diggy · June 29, 2019, 3:40pm

n-ozerov commented :

Well, i tried to think by myself and i realized how we can improve compactions:

We have Flatten function which brings all tables\keys\versions to same level. We apply that function.
We iterate through whole db and decide which sst tables we can delete immediately (all keys are tombstoned) as whole files.
We merge tables within same level (of course if it technically possible) where ranges of tombstones exceeded threshold (number or percent).
Then change level number to highest possible which fits into max level size.

Example:
Level: 0. 0 B Size. 0 B Max.
Level: 1. 0 B Size. 268 MB Max.
Level: 2. 233 MB Size. 2.7 GB Max.

Turns into:

Level: 0. 0 B Size. 0 B Max.
Level: 1. 233 MB Size. 268 MB Max.
Level: 2. 0 B Size. 2.7 GB Max.

Can we?

In the end we should have the most compacted tree at the highest possible level.

diggy · July 30, 2019, 1:10pm

stale commented :

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

diggy · August 31, 2019, 9:31am

stale commented :

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

diggy · August 31, 2019, 9:38am

manishrjain commented :

Let’s accept this issue. We should have a mechanism to compact tables based on some heuristics.

diggy · September 17, 2019, 4:10pm

campoy commented :

I’d like to have an idea of how much work this entails.

Could you @manishrjain, or maybe @jarifibrahim, give an estimate?

Topic		Replies	Views
Drop/Delete Range of Keys Issues badger , status:accepted , area:performance , area:api , skip:stale	3	1278	April 20, 2020
The algorithm behind the delete Badger	0	410	June 2, 2021
Delete documents does not work (always) Badger	1	1371	April 23, 2019
Disk usage problem after delete mutation Dgraph	1	171	April 10, 2024
Iterator perfomance degradation Badger kind:question	4	666	October 13, 2020

Some good strategies to implement

Related topics