Groupby operator

Groupby operation

Genreally, groupby is used to group a set of entities based on one or more of their properties and aggregate some other (one) property within this group.

A simple example is count of people from different countries.

{
me(func: gt(count(country), 1)) @groupby(country) {
 count( _uid_ )
}
}

Whatever is inside the @groupby level can only be aggregates.

Note: The argument to groupby can be values or ids

If we want to group by last name and country

{
me(func: gt(count(country), 1)) @groupby(country, last.name) {
 count( _uid_ )
}
}

The output of this groupby would contain the fields supplied to groupby and the aggregate value. (so we find the unique countries and lastnames and for each combination of them, count the number of people)

{
 me: [{"lastname": "alice", "country": "XYZ", "count(_uid_)": 5}]
}
# to select the max dob among my friends with same name
{
me(id: a) {
 friends @groupby(name) {
  max(dob)
 }
}
}
# Find the number of films of a director each actor has acted in.
# (Useful to find the most frequent actor.
{
 director(id: a) {
  films @groupby(actorid) {
    count(_uid_)
   }
  }
 }
}
# Find the latest film of all the actors who have acted with a director.
{
 director(id: a) {
  films @groupby(actorid) {
    max(initial_release_date)
   }
  }
 }
}

Please do leave your thoughts on this design, use-cases you might have for groupby and any modifications you’d like to see.

Ref:
https://www.techonthenet.com/sql/group_by.php

I would love to have this functionality. I have many use cases for it, but primarily, it would allow me to do aggregations of my data and then group those aggregates on the type of “connection” I am comparing. What I am aiming to do is to give a high level insight to the user about specific aspects of their company and/or life. This is all driven by a tagging system that I have built with dgraph. Being able to do aggregates based on these “tags” would be great. Compound groupings would be preferable as well: @groupby(actorlocation, actorspecialty, actorage) --> group by location, then subgroup buy movie specialty, then subgroup that by age, just as an example.

@ashwin95r: Your proposal only allows groupby to do aggregation. It won’t allow subgrouping. Is that right?
@willcj33: Can you give some example data and example result of what you’d expect from the groupby with sub-groupbys ?

Yes only aggregation on the grouped entities. If you mean grouping by multiple fields, yes that’d be allowed as specified in this example.[quote=“ashwin95r, post:1, topic:1435”]
{
me(func: gt(count(country), 1))
@groupby(country, last.name) {
count( uid )
}
}
[/quote]

So just to be sure @ashwin95r, it would group by country, then make “sub” groups in that group correct? Now with that, is there sorting per group? The simple group by would be enough for me now, but sub groups sound useful if I have (let’s say) ‘Departments’ in a company, and the ‘Teams’ within departments and I want to do some analytics around work output and efficiency.

If you have name and age as the groupby fields.

{ 
me(id: a) {
friend @groupby(name, age) {
count(_uid_)
}
}

The result would have units like

{
 "name": "a", 
 "age": "10",
 "count": 5,
},
{
 "name": "a", 
 "age": "20",
 "count": 10,
},

and so on

Thats the plan. I’ll do what makes most sense while implemeting and get back here.

1 Like

@willcj33 do I understand you correctly, that you would like to have results like:

 {
 "name": "a", 
 "age": "10",
 "count": 5,
},
{
 "name": "a", 
 "age": "20",
 "count": 10,
},
{
 "name": "a",
 "count": 15,
}
 {
 "name": "b", 
 "age": "20",
 "count": 7,
},
{
 "name": "c", 
 "age": "25",
 "count": 3,
},
{
 "name": "b",
 "count": 10,
}
...

@tzdybal @ashwin95r ^^^ those two examples are correct in respect to what I am looking for.

This has been merged to master. Can be tried out form the dev-release https://github.com/dgraph-io/dgraph/releases/tag/nightly

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.