Compaction, MaxLevels, and other parameters


(Ben de Graaff) #1

I ran into an issue where, upon compaction, my application would crash because Badger panics in doCompact because the operation would exceed MaxLevels:

y.AssertTrue(l+1 < s.kv.opt.MaxLevels) // Sanity check.

Considering the response to this issue: https://github.com/dgraph-io/badger/issues/620
I was wondering whether this is really intended (“unpredictable” panics aren’t very nice) or if I’m doing something wrong.

Use-case: I have fairly small keys (~32 bytes) with fairly small values (<256 bytes) for a task queue system. My operations are limited to insert, move/rename (from “queued” to “active”), followed by delete (“completed”). The number of entries in this queue are all over the place, but let’s say that generally this would be between 10k-100k. Apart from some tasks having higher priority than others, the queue is mostly FIFO.
I noticed with the default parameters seeking/iteration would start taking longer, so I unscientifically tweaked some of the parameters to what I thought would better fit my workload.

E.g.:

        opts.MaxTableSize = 256 << 15
        opts.LevelOneSize = 256 << 9
        opts.ValueLogMaxEntries = 32000
        opts.LevelSizeMultiplier = 2
        opts.NumMemtables = 2
        opts.NumLevelZeroTables = 2
        opts.NumLevelZeroTablesStall = 4

Which leads me to the following questions:

  • Does the MaxLevels assertion make sense on compaction? Do I have to pick a value for MaxLevels myself, and based on what metric should I do this?
  • Is tweaking these options on an existing database supported, or will this cause unforeseen issues?
  • Are there any other tips for e.g. reducing memory usage and picking params to better discard deleted data for my use-case?

(Ashish) #2

@Bun Default value of MaxLevels is 7. Since LevelSizeMultiplier is 2 and LevelOneSize is 256<<9, Max size of level 7 can be 8MB. This is getting crossed(because of the small size of lower levels) and compaction is getting run on the last level. Since we cannot have more levels, assert is getting failed.

Please find the answers below:

  • Yes, it checks we are not crossing MaxLevels. You can have this default. Normally this is chosen based on how much data you are going to have in Badger(LSM Tree).
  • This should not be a problem.
  • What is the system memory you are trying to run Badger on? Based on your requirements(32-bytes keys, 256 bytes values, max 100K entries) memory usage should be small(in MBs).
    Compaction is run every second, which should be fast enough to delete discarded data. You have put MaxTableSize also as small value, which helps faster compaction at level zero.
    To avoid the above assertion failure, you can have default LevelOneSize value(or try to increase it from existing).
    For fast seek/iteration you can try changing ValueThreshold to max value size you can have. This will colocate keys and values together in LSM tree. The next thing you can try is having all the tables in RAM(have TableLoadingMode value as LoadToRAM), but this can increase memory usage.