Some good strategies to implement

Moved from GitHub badger/909

Posted by n-ozerov:

Recenty, i’ve found report about MongoDB engine based on RocksDB: https://pgday.ru/presentation/178/5964963da443b.pdf

Google Translated version is here: https://drive.google.com/file/d/1JL2XbYTQFm7favk77TuX9We36ZFAHU2p/view?usp=sharing

There were mentioned two good things – how to reclaim space and column families:

Deletion:

  1. Collecting stats about tombstones during iteration on range.
  2. If the amount exceeded threshold, start compaction for this range.

This could be helpful for scenarios with many (or batch range) deletion (queues, etc).

Also it has function DeleteFilesInRange which allows to delete files if all keys in specified range.

Column families allows to use many LSM trees which share same WAL to allow making transactions. Which could be good for some operation like deleting whole column by deleting whole tree, it also should take less space than using prefixes.

Sorry for my bad English. I’ll try to explain again if you didn’t understand me.

campoy commented :

Hi there @n-ozerov,

The second link is broken, and unfortunately, I do not speak Russian.
Could you provide more info about the features?

Thanks

n-ozerov commented :

Updated link. I’ve just found that many of text\images rely on https://github.com/facebook/rocksdb/wiki/Compaction
https://github.com/facebook/rocksdb/wiki/Manual-Compaction#compactrange
https://github.com/facebook/rocksdb/wiki/Delete-A-Range-Of-Keys
https://rocksdb.org/blog/2018/11/21/delete-range.html
https://github.com/facebook/rocksdb/wiki/Implement-Queue-Service-Using-RocksDB (CompactOnDeletionCollector)
https://github.com/facebook/rocksdb/wiki/Column-Families

The most interesting pages are 44-64, 75-87.

"Request cursor statistics after range iteration. If the amount of missing data has exceeded threshold to run compaction for a given range.

DeleteFilesInRange, allows you to delete files keys in which are completely within the specified range"

“In the classic engines for each container (data or index) there is one tree. In MongoRocks, one LSM tree for all” – picture on page 80

“RocksDB supports many LSM trees (column families) united by journal WAL to ensure transactional. Originally made and applied with the expectation of Mysql. MongoRocks should have one LSM tree for one prefix”

I’ve found a few feature requests where people asked about column families and got rejected, so it would be better to focus on LSM compaction.

As far as i understood, if iterator notices huge range of tombstones it forces compactions in this range. Is it possible to implement in Badger?

n-ozerov commented :

Well, i tried to think by myself and i realized how we can improve compactions:

  1. We have Flatten function which brings all tables\keys\versions to same level. We apply that function.

  2. We iterate through whole db and decide which sst tables we can delete immediately (all keys are tombstoned) as whole files.

  3. We merge tables within same level (of course if it technically possible) where ranges of tombstones exceeded threshold (number or percent).

  4. Then change level number to highest possible which fits into max level size.

Example:
Level: 0. 0 B Size. 0 B Max.
Level: 1. 0 B Size. 268 MB Max.
Level: 2. 233 MB Size. 2.7 GB Max.

Turns into:

Level: 0. 0 B Size. 0 B Max.
Level: 1. 233 MB Size. 268 MB Max.
Level: 2. 0 B Size. 2.7 GB Max.

Can we?

  1. In the end we should have the most compacted tree at the highest possible level.

stale commented :

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale commented :

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

manishrjain commented :

Let’s accept this issue. We should have a mechanism to compact tables based on some heuristics.

campoy commented :

I’d like to have an idea of how much work this entails.

Could you @manishrjain, or maybe @jarifibrahim, give an estimate?