Counting predicate cardinality should be fast

Moved from GitHub dgraph/5813

Posted by EnricoMi:

Experience Report

What you wanted to do

I want to know how many uids have a specific predicate.

What you actually did

The query

  result (func: has(pred)) {

gives me the number of uids that have predicate pred.

Why that wasn’t great, with examples

This query is very slow for large predicates. There must be some index or cardinality information for a predicate available to answer this query in constant time. The “Schema” tab in Ratel provides for predicates the “Samples & Statistics” tab, which seems to fire the same query, which takes for ever. Improving this query would also improve UX for Ratel:

danielmai commented :

There’s an existing feature request in #3054 to add quick approximate counts. Would this work for you?

Approximation as in Implement approximate counts using HyperLogLog++ algorithm would work well for me. I would need this not for a single but multiple predicates though:

  pred1 as var(func: has(<dgraph.graphql.schema>))
  pred2 as var(func: has(<dgraph.graphql.xid>))
  pred3 as var(func: has(<dgraph.type>))

  result (func: uid(pred1,pred2,pred3)) {

Would that work as well?