Implement approximate counts using HyperLogLog++ algorithm

Moved from GitHub dgraph/3054

Posted by makitka2007:

is it possible to implement approximate counts? in our case we don’t need to know exact number of linked resources, if they are 324905 or 319043. in ElasticSearch we can specify precision for counts, like, give me exact count if results are less than 100, and approximate count if results are bigger, so such count performs really faster and consume less memory than exact count which we actually don’t need for big values

as i understand, HyperLogLog++ is not super hard to implement but it can significantly improve performance for counts when we don’t need exact value when it’s big.

having something like count_approximate(relation, 100) @filter(...) where 100 is a precision (value under which exact value is guaranteed) would be very useful

1 Like