Filtering before aggregation not working

What version of Dgraph are you using?

master, December the 7th 2020

Have you tried reproducing the issue with the latest release?

yes

Steps to reproduce the issue (command/config used to run Dgraph).

When using aggregation functions, pre-filtering variables does not have any effect.
I tried with @filter and first without success.

Steps:

  • Gather uid/values in a variable
  • Create a block that filters out some values
  • Use an aggregation function

Example on https://play.dgraph.io/ :

{   
  var1(func:allofterms(name@en, "Peter Jackson")) {
    ct as count(director.film)
  }
  var2(func: uid(ct)) @filter(gt(val(ct), 0)) {
    ct_pos as math(ct)
  }
  me() {
    avg(val(ct_pos))
  }
}

Expected behaviour and actual result.

I except an average of 19.

I get 3.8:

{
    "var1": [
      {
        "count(director.film)": 19
      },
      {
        "count(director.film)": 0
      },
      {
        "count(director.film)": 0
      },
      {
        "count(director.film)": 0
      },
      {
        "count(director.film)": 0
      }
    ],
    "var2": [
      {
        "val(ct_pos)": 19
      }
    ],
    "me": [
      {
        "avg(val(ct_pos))": 3.8
      }
    ]
  }

try this ?

{   
  var1(func:allofterms(name@en, "Peter Jackson")) {
    ct as count(director.film)
  }
  var2(func: uid(ct)) @filter(gt(val(ct), 0)) {
     uid
    ct2 as count(director.film)
  }
  me() {
    avg(val(ct2))
  }
}

It looks like we need a new variable .

just move the filter to first var block, then you got it.

{   
  var1(func:allofterms(name@en, "Peter Jackson")) @filter(gt(count(director.film), 0)){
    ct as count(director.film)
  }

  me() {
    avg(val(ct))
  }
}

1 Like

Thanks @BlankRain, indeed it works that way.
However we have to count films twice.

In my personal case, I do it on 2 big counts with complex filters.
It does not provide acceptable performances.

yes , big counts for dgraph is slow.
sometimes it even throw some errors .

You can pre-compute the count you want, and store it in dgraph.
use upsert block when you change the data (both update the data and the count).

  1. count all
upsert{
  query{
   q(func: has(student)){
   c as count(student)
   }
  }
mutation {
 set{
    <0x01> <studentCount> val(c) .  
  }

}
}
  1. update when insert new student
upsert{
   query{
      q(func: uid(0x01)){
       c as studentCount
      }
   }
  mutation{
   _:a student "somevalue" .
  <0x01> <studentCount> val(c)+1 .
  }
}

see more details on upsert condition, it support some filter and if condition

  1. when you need query ,just query
   q(func: uid(0x01){
      studentCount
  }

Thanks but it cannot be pre-compiled for my use case as it has to be a dynamic count that can evolve at any time. It would be far too slow to update such a new indexed value.