Recommended config for fixed key/value lengths

Hi!

I’m trying to learn a bit more about the internals of Badger and the LSM tree to be able to fine-tune my database: a simple sha1->sha256 map, what makes all keys and values have the same size respectively.

  • Should I tweak ValueThreshold, (e.g. using LSMOnlyOptions) so all my data fits into the tree?
  • Using the default options, badger info shows:
    [     2020-11-13T21:40:04Z] MANIFEST      932 B MA
    [                      now] 000087.sst    71 MB L1
    [                      now] 000088.sst    30 MB L1
    [         4 months earlier] 000000.vlog   72 MB VL
    [         1 second earlier] 000001.vlog   22 MB VL
    
    [EXTRA]
    [2020-07-08T08:32:59Z] KEYREGISTRY    28 B
    
    [Summary]
    Level 0 size:          0 B
    Level 1 size:       101 MB
    Total index size:   101 MB
    Value log size:      94 MB
    
    Abnormalities:
    1 extra file.
    0 missing files.
    0 empty files.
    0 truncated manifests.
    
    • Given that the MemoryMap loading mode will pre-allocate 2*ValueLogFileSize = 2GB, most of this memory is unused, if I understand correctly. Is ValueLogFileSize=256MB a reasonable value?
    • My data will only increase over time, what are the implications of using a too-small ValueLogFileSize?
  • Is there any additional option I should be looking at? e.g. to adjust key/value to the expected data?

Hey @aruiz14

  • Should I tweak ValueThreshold , (e.g. using LSMOnlyOptions ) so all my data fits into the tree?

If your value size is less than 1 KB (the default value threshold), the values will be in LSM Tree. If you have sha256 as the value, it would take up 32 bytes and fit in the LSM Tree.

  • Given that the MemoryMap loading mode will pre-allocate 2*ValueLogFileSize = 2GB , most of this memory is unused, if I understand correctly. Is ValueLogFileSize=256MB a reasonable value?

That should be okay.

  • My data will only increase over time, what are the implications of using a too-small ValueLogFileSize ?

You’ll have many file descriptors open (1 for each file). I don’t think anything else would be affected.

You would have issues with disk usage since vlog files are not cleaned up quickly in the current release of badger. These disk issues will be fixed in next release of badger which will happen next month (the code is already in master).

3 Likes

Thanks for your reply!

1 Like