Custom comparison function

diggy · April 22, 2019, 6:00am

Moved from GitHub badger/776

Hello,

Today the key comparison function used by the database is bytes.Compare. In cases where we’re dealing with fixed-size integer or ascii keys this makes a lot of sense, but I have a use case for using variable size integers as keys (similar to big.Int), and the order of the keys as defined by bytes.Compare is not the natural order of the integer values.

I’d like to explore modifying the package to make the comparison function configurable at the database level, but I wanted to get feedback from someone more familiar with the implementation in case there were any major concerns.

diggy · May 10, 2019, 4:04pm

achille-roussel commented :

@jarifibrahim thanks for labelling the issue.

Do you have any pointers regarding what I should be watching out for to make this change?

diggy · May 11, 2019, 8:53am

jarifibrahim commented :

@achille-roussel I am not very sure but I see that we’re using the comparekeys function everywhere

github.com

dgraph-io/badger/blob/82e4521c6cca4f7094446f22f94b8d2bb5cee7ac/y/y.go#L124-L135


// CompareKeys checks the key without timestamp and checks the timestamp if keyNoTs
// is same.
// a<timestamp> would be sorted higher than aa<timestamp> if we use bytes.compare
// All keys should have timestamp.
func CompareKeys(key1, key2 []byte) int {
	AssertTrue(len(key1) > 8 && len(key2) > 8)
	if cmp := bytes.Compare(key1[:len(key1)-8], key2[:len(key2)-8]); cmp != 0 {
		return cmp
	}
	return bytes.Compare(key1[len(key1)-8:], key2[len(key2)-8:])
}

Try replaying this with your custom comparator and see if it works. I think it would work considering the fact that the underlying representation of keys remains the same.

diggy · June 28, 2019, 6:24pm

campoy commented :

Hey @achille-roussel,

Since this would indeed make the database slower for all cases, I want to better understand the reasoning behind the feature itself.

Do you think it’d be possible to somehow convert your keys instead of modifying the way we compare them? What is exactly the use case you’re implementing?

Thanks!

diggy · June 29, 2019, 7:50pm

achille-roussel commented :

The use case I have is using variable length integers as keys (big.Int encoded as a little-endian byte sequence). To have the sequence of keys be ordered by the value of the big.Int I need to provide a different comparison function.

Currently the program is using rocksdb which supports configuring a custom comparison function, but it comes with its own set of issues which is why I’m exploring badger as an option to replace it.

diggy · July 1, 2019, 6:14pm

campoy commented :

I understand it, would encoding your keys in a different format be too penalizing for your performance, or are there other issues you’re concerned about?

Let us get v2.0 released and then we can reconsider this PR - I think it does have merit but I want to make sure we don’t introduce any performance hit for our existing users.

Do you have a deadline in mind for this project?

diggy · July 1, 2019, 7:45pm

achille-roussel commented :

For now we can continue with rocksdb, the switch to badger would be part of research we do for optimization purposes so we don’t have a strict deadline for this part of the work, if it brings improvements we can ship this later.

In the mean time, I’ll dig into the benchmarks that show performance regressions to see if there are ways to optimize the change to make it more acceptable.

diggy · July 2, 2019, 6:39pm

campoy commented :

Thanks for understanding, @achille-roussel

Topic		Replies	Views
bytes.Compare() is used with keys inside BadgerDB, and is about 35% slower than using binary.BigEndian Badger	0	396	November 8, 2021
Get values in key order Badger	0	548	February 3, 2021
Performance issue in prefix iteration Badger kind:question	0	857	March 17, 2022
Request option to remove and/or disallow multiple versions of keys when the values are identical Badger	2	662	March 29, 2021
Strange behavior on DB.Load Badger kind:question , kind:bug	8	946	February 23, 2021

Custom comparison function

Related topics