Badger: Optimize bloomfilter parameters

neilchennn · December 10, 2021, 9:21am

According to the formula shown in the WIKI：https://en.wikipedia.org/wiki/Bloom_filter#Optimal_number_of_hash_functions， number of hash functions k is
, and m is bitsPerKey

so the calculation of bitsPerKey in badger is not right,

func BloomBitsPerKey(numEntries int, fp float64) int {
	size := -1 * float64(numEntries) * math.Log(fp) / math.Pow(float64(0.69314718056), 2)
	locs := math.Ceil(float64(0.69314718056) * size / float64(numEntries))
	return int(locs)
}

func appendFilter(buf []byte, keys []uint32, bitsPerKey int) []byte {
	if bitsPerKey < 0 {
		bitsPerKey = 0
	}
	// 0.69 is approximately ln(2).
	k := uint32(float64(bitsPerKey) * 0.69)
	if k < 1 {
		k = 1
	}
	if k > 30 {
		k = 30
	}

	nBits := len(keys) * int(bitsPerKey)
	// For small len(keys), we can see a very high false positive rate. Fix it
	// by enforcing a minimum bloom filter length.
	if nBits < 64 {
		nBits = 64
	}
	......

In BloomBitsPerKey(numEntries int, fp float64),

func BloomBitsPerKey(numEntries int, fp float64) int {
	size := -1 * float64(numEntries) * math.Log(fp) / math.Pow(float64(0.69314718056), 2)
	locs := math.Ceil(float64(0.69314718056) * size / float64(numEntries))
	return int(locs)
}

The locs is ln2 * m / n， but in func appendFilter(), calculate the number of hash functions，ln2 was calculated again

It should be

func BloomBitsPerKey(numEntries int, fp float64) int {
	size := -1 * float64(numEntries) * math.Log(fp) / math.Pow(float64(0.69314718056), 2)
	locs := math.Ceil(size / float64(numEntries))
	return int(locs)
}

Here is the pr: https://github.com/dgraph-io/badger/pull/1763

Topic		Replies	Views
Performance issue in prefix iteration Badger kind:question	0	860	March 17, 2022
Iterators with bloom filter Badger	5	1014	May 9, 2019
Recommended config for fixed key/value lengths Badger kind:question	2	898	November 18, 2020
Bloomfilters are enabled in dgraph Dgraph	2	409	June 25, 2020
Figuring optimal options for usage pattern App Development badger	2	625	January 12, 2021

Badger: Optimize bloomfilter parameters

Related topics