Filtering before aggregation not working

myo · December 8, 2020, 8:01am

What version of Dgraph are you using?

master, December the 7th 2020

Have you tried reproducing the issue with the latest release?

yes

Steps to reproduce the issue (command/config used to run Dgraph).

When using aggregation functions, pre-filtering variables does not have any effect.
I tried with @filter and first without success.

Steps:

Gather uid/values in a variable
Create a block that filters out some values
Use an aggregation function

Example on https://play.dgraph.io/ :

{   
  var1(func:allofterms(name@en, "Peter Jackson")) {
    ct as count(director.film)
  }
  var2(func: uid(ct)) @filter(gt(val(ct), 0)) {
    ct_pos as math(ct)
  }
  me() {
    avg(val(ct_pos))
  }
}

Expected behaviour and actual result.

I except an average of 19.

I get 3.8:

{
    "var1": [
      {
        "count(director.film)": 19
      },
      {
        "count(director.film)": 0
      },
      {
        "count(director.film)": 0
      },
      {
        "count(director.film)": 0
      },
      {
        "count(director.film)": 0
      }
    ],
    "var2": [
      {
        "val(ct_pos)": 19
      }
    ],
    "me": [
      {
        "avg(val(ct_pos))": 3.8
      }
    ]
  }

BlankRain · December 8, 2020, 8:28am

try this ?

{   
  var1(func:allofterms(name@en, "Peter Jackson")) {
    ct as count(director.film)
  }
  var2(func: uid(ct)) @filter(gt(val(ct), 0)) {
     uid
    ct2 as count(director.film)
  }
  me() {
    avg(val(ct2))
  }
}

It looks like we need a new variable .

just move the filter to first var block, then you got it.

{   
  var1(func:allofterms(name@en, "Peter Jackson")) @filter(gt(count(director.film), 0)){
    ct as count(director.film)
  }

  me() {
    avg(val(ct))
  }
}

myo · December 8, 2020, 9:33am

Thanks @BlankRain, indeed it works that way.
However we have to count films twice.

In my personal case, I do it on 2 big counts with complex filters.
It does not provide acceptable performances.

BlankRain · December 9, 2020, 3:02am

yes , big counts for dgraph is slow.
sometimes it even throw some errors .

You can pre-compute the count you want, and store it in dgraph.
use upsert block when you change the data (both update the data and the count).

count all

upsert{
  query{
   q(func: has(student)){
   c as count(student)
   }
  }
mutation {
 set{
    <0x01> <studentCount> val(c) .  
  }

}
}

update when insert new student

upsert{
   query{
      q(func: uid(0x01)){
       c as studentCount
      }
   }
  mutation{
   _:a student "somevalue" .
  <0x01> <studentCount> val(c)+1 .
  }
}

see more details on upsert condition, it support some filter and if condition

when you need query ,just query

   q(func: uid(0x01){
      studentCount
  }

myo · December 9, 2020, 5:07pm

Thanks but it cannot be pre-compiled for my use case as it has to be a dynamic count that can evolve at any time. It would be far too slow to update such a new indexed value.

Topic		Replies	Views
How to filter result after groupby Documentation	1	221	January 20, 2024
Count filter in internal block Dgraph	6	512	October 14, 2021
Count total before pagination Dgraph	8	581	September 21, 2021
Nested / cyclic filtering not working Dgraph dgraph , untagged	3	968	March 26, 2020
Need Assistance with Dgraph Query Filtering Issue Dgraph	2	327	October 12, 2023

Filtering before aggregation not working

What version of Dgraph are you using?

Have you tried reproducing the issue with the latest release?

Steps to reproduce the issue (command/config used to run Dgraph).

Expected behaviour and actual result.

Related topics