Groupby operator

ashwin95r · April 21, 2017, 8:26am

Groupby operation

Genreally, groupby is used to group a set of entities based on one or more of their properties and aggregate some other (one) property within this group.

A simple example is count of people from different countries.

{
me(func: gt(count(country), 1)) @groupby(country) {
 count( _uid_ )
}
}

Whatever is inside the @groupby level can only be aggregates.

Note: The argument to groupby can be values or ids

If we want to group by last name and country

{
me(func: gt(count(country), 1)) @groupby(country, last.name) {
 count( _uid_ )
}
}

The output of this groupby would contain the fields supplied to groupby and the aggregate value. (so we find the unique countries and lastnames and for each combination of them, count the number of people)

{
 me: [{"lastname": "alice", "country": "XYZ", "count(_uid_)": 5}]
}

# to select the max dob among my friends with same name
{
me(id: a) {
 friends @groupby(name) {
  max(dob)
 }
}
}

# Find the number of films of a director each actor has acted in.
# (Useful to find the most frequent actor.
{
 director(id: a) {
  films @groupby(actorid) {
    count(_uid_)
   }
  }
 }
}

# Find the latest film of all the actors who have acted with a director.
{
 director(id: a) {
  films @groupby(actorid) {
    max(initial_release_date)
   }
  }
 }
}

Please do leave your thoughts on this design, use-cases you might have for groupby and any modifications you’d like to see.

Ref:
https://www.techonthenet.com/sql/group_by.php

willcj33 · April 21, 2017, 2:26pm

I would love to have this functionality. I have many use cases for it, but primarily, it would allow me to do aggregations of my data and then group those aggregates on the type of “connection” I am comparing. What I am aiming to do is to give a high level insight to the user about specific aspects of their company and/or life. This is all driven by a tagging system that I have built with dgraph. Being able to do aggregates based on these “tags” would be great. Compound groupings would be preferable as well: @groupby(actorlocation, actorspecialty, actorage) → group by location, then subgroup buy movie specialty, then subgroup that by age, just as an example.

mrjn · April 22, 2017, 12:36am

@ashwin95r: Your proposal only allows groupby to do aggregation. It won’t allow subgrouping. Is that right?
@willcj33: Can you give some example data and example result of what you’d expect from the groupby with sub-groupbys ?

ashwin95r · April 23, 2017, 11:32pm

Yes only aggregation on the grouped entities. If you mean grouping by multiple fields, yes that’d be allowed as specified in this example.[quote=“ashwin95r, post:1, topic:1435”]
{
me(func: gt(count(country), 1))
@groupby(country, last.name) {
count( uid )
}
}
[/quote]

willcj33 · April 25, 2017, 2:46pm

So just to be sure @ashwin95r, it would group by country, then make “sub” groups in that group correct? Now with that, is there sorting per group? The simple group by would be enough for me now, but sub groups sound useful if I have (let’s say) ‘Departments’ in a company, and the ‘Teams’ within departments and I want to do some analytics around work output and efficiency.

ashwin95r · April 26, 2017, 5:13am

If you have name and age as the groupby fields.

{ 
me(id: a) {
friend @groupby(name, age) {
count(_uid_)
}
}

The result would have units like

{
 "name": "a", 
 "age": "10",
 "count": 5,
},
{
 "name": "a", 
 "age": "20",
 "count": 10,
},

and so on

Thats the plan. I’ll do what makes most sense while implemeting and get back here.

tzdybal · April 26, 2017, 5:17pm

@willcj33 do I understand you correctly, that you would like to have results like:

 {
 "name": "a", 
 "age": "10",
 "count": 5,
},
{
 "name": "a", 
 "age": "20",
 "count": 10,
},
{
 "name": "a",
 "count": 15,
}
 {
 "name": "b", 
 "age": "20",
 "count": 7,
},
{
 "name": "c", 
 "age": "25",
 "count": 3,
},
{
 "name": "b",
 "count": 10,
}
...

willcj33 · April 27, 2017, 1:05pm

@tzdybal @ashwin95r ^^^ those two examples are correct in respect to what I am looking for.

ashwin95r · May 4, 2017, 11:58pm

This has been merged to master. Can be tried out form the dev-release Release nightly [deprecated] · dgraph-io/dgraph · GitHub

ashwin95r · November 28, 2017, 1:00am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
DQL: How to group by inside group by? GraphQL kind:question , dgraph	1	720	March 29, 2022
Groupby and sum predicate Dgraph	1	414	June 8, 2022
Groupby inside block without displaying the result Users	2	335	August 17, 2019
GroupBy - Query language Documentation	0	410	August 28, 2020
What is the technical reason we can't store variables of @groupby unless it uses "uid"? Dgraph	3	844	October 5, 2018

Groupby operator

Related Topics