Filtering before aggregation not working

myo · December 8, 2020, 8:01am

What version of Dgraph are you using?

master, December the 7th 2020

Have you tried reproducing the issue with the latest release?

yes

Steps to reproduce the issue (command/config used to run Dgraph).

When using aggregation functions, pre-filtering variables does not have any effect.
I tried with @filter and first without success.

Steps:

Gather uid/values in a variable
Create a block that filters out some values
Use an aggregation function

Example on https://play.dgraph.io/ :

{   
  var1(func:allofterms(name@en, "Peter Jackson")) {
    ct as count(director.film)
  }
  var2(func: uid(ct)) @filter(gt(val(ct), 0)) {
    ct_pos as math(ct)
  }
  me() {
    avg(val(ct_pos))
  }
}

Expected behaviour and actual result.

I except an average of 19.

I get 3.8:

{
    "var1": [
      {
        "count(director.film)": 19
      },
      {
        "count(director.film)": 0
      },
      {
        "count(director.film)": 0
      },
      {
        "count(director.film)": 0
      },
      {
        "count(director.film)": 0
      }
    ],
    "var2": [
      {
        "val(ct_pos)": 19
      }
    ],
    "me": [
      {
        "avg(val(ct_pos))": 3.8
      }
    ]
  }

BlankRain · December 8, 2020, 8:28am

try this ?

{   
  var1(func:allofterms(name@en, "Peter Jackson")) {
    ct as count(director.film)
  }
  var2(func: uid(ct)) @filter(gt(val(ct), 0)) {
     uid
    ct2 as count(director.film)
  }
  me() {
    avg(val(ct2))
  }
}

It looks like we need a new variable .

just move the filter to first var block, then you got it.

{   
  var1(func:allofterms(name@en, "Peter Jackson")) @filter(gt(count(director.film), 0)){
    ct as count(director.film)
  }

  me() {
    avg(val(ct))
  }
}

myo · December 8, 2020, 9:33am

Thanks @BlankRain, indeed it works that way.
However we have to count films twice.

In my personal case, I do it on 2 big counts with complex filters.
It does not provide acceptable performances.

BlankRain · December 9, 2020, 3:02am

yes , big counts for dgraph is slow.
sometimes it even throw some errors .

You can pre-compute the count you want, and store it in dgraph.
use upsert block when you change the data (both update the data and the count).

count all

upsert{
  query{
   q(func: has(student)){
   c as count(student)
   }
  }
mutation {
 set{
    <0x01> <studentCount> val(c) .  
  }

}
}

update when insert new student

upsert{
   query{
      q(func: uid(0x01)){
       c as studentCount
      }
   }
  mutation{
   _:a student "somevalue" .
  <0x01> <studentCount> val(c)+1 .
  }
}

see more details on upsert condition, it support some filter and if condition

when you need query ,just query

   q(func: uid(0x01){
      studentCount
  }

myo · December 9, 2020, 5:07pm

Thanks but it cannot be pre-compiled for my use case as it has to be a dynamic count that can evolve at any time. It would be far too slow to update such a new indexed value.

Topic		Replies	Views
Misleading Dgraph tour example Dgraph area:documentation	2	616	September 20, 2020
Cascade directive with inconsistent behavior using Var Block Dgraph kind:bug	3	491	October 22, 2020
Aggregation by product Dgraph	7	1007	May 21, 2019
Filters not working when using them with variables in an upsert Dgraph kind:question , dgraph	2	413	September 16, 2020
Variable not filtered by @cascade Dgraph kind:question	5	428	May 11, 2021

Filtering before aggregation not working

What version of Dgraph are you using?

Have you tried reproducing the issue with the latest release?

Steps to reproduce the issue (command/config used to run Dgraph).

Expected behaviour and actual result.

Related topics