Count Queries in GraphQL

Motivation

Although, there exists a count functionality to count number of matched nodes or predicates in DQL, Dgraph GraphQL currently does not support count.
This feature of supporting count in Dgraph GraphQL has been requested by Users many times. Here are some of the related discuss posts:

  1. Add @count to GraphQL schemas and queries
  2. Fetch all counters with GraphQL API
  3. Count lists
  4. Will Dgraph GraphQL be supporting GraphQL+- capability such as count() and sum()?
  5. Group nodes by a literal and count the number of categories

This RFC is about supporting count natively with Dgraph GraphQL .

User Impact

Users will now be able to use count functionality of DQL in Dgraph GraphQL.

Implementation

There are two ways in which count is used in DQL.

  1. count(uid) counts the number of UIDs matched in the enclosing block.
  2. count(predicate) counts how many predicate edges lead out of a node.

Count in GraphQL will also support this two ways.

Count matched UIDs

Example GraphQL Schema:

type Data {
	id: ID!
    name: String!
	intList: [Int]
	stringList: [String]
 	metaData: [Data]
}

The above input GraphQL schema will generate an additional countData query in the output schema as follows:

query {
    countData(filter: DataFilter): Int
}

The input type to the countFilter query, DataFilter will be the same as input type of queryFilter. The countFilter query will return the number of matched UIDs satisfying the filter condtition.

The above count GraphQL query would be rewritten to the following DQL query.

query {
    queryData(func: type(Data)) @filter(/* rewritten filter condtition */)) {
        count: count(uid)
    }
}

Count predicate edges

For every field in input schema which is of type list, an extra field will be generated in output schema to store its count. These extra fields will be utilized to return the count of predicate edges.

The example input schema:

type Data {
	id: ID!
	intList: [Int]
	stringList: [String]
 	metaData: [Data]
}

will generated the following extra fields in output schema.

countintList: Int
countstringList: Int
countmetaData: Int

These fields could be used with queryData and getData queries to return count of predicate edges as follows.

query {
    queryData {
        name
        countintList
        countmetaData
    }
}

The above query would be rewritten to DQL as follows.

query {
    queryData(func: type(Data))  {
        Data.name
        count(Data.intList)    /* returns number of items in intList*/
        count(Data.metaData)   /* returns number of items in metaData list*/
    }
}

Open Questions

  1. Should count be enabled by default for all Graphql Schemas or should there be an option to enable count queries and count fields for the types and predicates which are specified by the User.
  2. Should we be supporting filtering on count predicate edges. For example, should there be an option to filter out only some of the Data from metaData and return a count of it.

Count Queries on other GraphQL platforms

  1. Hasura has support for counting predicate edges using their aggregation queries on aggregate fields like sum, count, avg.
  2. GraphQL does not natively support count. There are some other platforms which recommend storing of a count attribute manually and updating it every time data is updated to support count functionality.

References

  1. https://dgraph.io/docs/query-language/count
  2. https://hasura.io/docs/1.0/graphql/core/queries/aggregation-queries.html
2 Likes

I believe the number one use case will be counting for paginate data. What would be nice if possible is a way to use the same filter and results with cascade applied.

Two separate suggestions:

  1. Use the same filter:

If I pass in values and build a pretty complex filter and do some pagination I would want to get a total count of the filtered results that way I can quickly do a complete pagination UI display with a first, ...current..., last option. Right now there is no good way to do this without getting all of the ids and counting them in a response which is making the client do extra work. But bottom line, I would want a way to signify to use the same filter|results from a block above. I know graphql really makes this complicated, but I think it would be possible with the graphql parser and DQL generator script that is working behind the scenes. Maybe something like this:

# generated queries
type Query {
  "Placeholder for Counted Nodes, used by @count directive"
  Count: Int
}
# example Query
{
  getData(
    filter: {
      # really complex filter
    }
    first: 10
  ) @count(query: Count) {
    id
  }
  Count # should ignore the first and offset filter properties
  allData:getData @count(query: allDataCount) {
    id
  }
  allDataCount:Count
}

the Count query could then just be a placeholder type that returns 0 unless a count directive utilizes it with or without an alias. And maybe even some smart logic in here that if only a single @count directive is given and only a single Count query is given, then place the Count in that return. I think this is making sense, at least to me it does.

EDIT: I am not sure in my after thought if the aliases are available at the deeper level or not? Does the GraphQL client strip off the aliases and then reattach them with the returned results, or are the aliases available to the resolvers as well? I havenā€™t wrote a resolver for many months now which makes me quite happy!

  1. Work somehow with applying cascade. To explain my point here, examine this query:
{
  queryData(first:5) @cascade {
    id
    metaData(filter: {id:["0x1"]}) {
      id
    }
  }
}

This query gets only the data that are the parents of the metaData with the id <0x1>. I would want to be able to count this as a total result disregarding the first directive while honoring the cascade directive. But keep in mind that @cascade still does not work with pagination, so this may be a limiting factor.

3 Likes

Thanks for your comments @amaster507 .

Yes, it will be possible to apply count along with cascade.

With the current design of count queries, this could be handled by making two queries, first a count query to get the number of UIDs with given conditions and then a query with pagination to obtain the result. These queries could also be run simultaneously.

Why would you want to disregard the first directive. With the current design, it will be possible to count with cascade directive. Querying for other fields will have to be done separately.

A more general question is to whether to have separate count queries or to somehow accomodate the count part in the query itself as it is done in DQL.

1 Like

Possible Alternative for countData type queries

The method to have separate queries of the type countData has the following drawbacks:

  1. The query has only one field in their return type which is count of UIDs matching filter. It cannot have any other fields of the Data type.
  2. For most purposes, a countData query will have to be accompanied by queryData query. As countData query does not return anything more than count of UIDs. It may not prove that useful as it is intended to be.

To solve this problem of countData type query, the alternative is to make all queryData type queries return an extra field count . This field will contain the count of UIDs satisfying the given filter condition.

This approach has the following advantages:

  1. Any other fields of Data type could also be queried along with this count variable. Making it easier to get all needed data using a single queryData query rather than using 2 queries, countData and queryData.
  2. This way of having a count variable in queryData is more consistent with the way count is handled in DQL. There is no separate countData query in DQL. Rather, a count(UID) variable is present which could be used to get a count of matching UIDs. This will make rewriting to DQL easier.

This approach has the following disadvantage:

  1. This will be a breaking change as the return type of queries of the form, queryData will change from [Data] to a structure containing int (to store count) and Data. This will have to be implemented and tested with greater care to avoid any problems with current structure.

Sample Example:

type Data {
	id: ID!
    name: String!
	intList: [Int]
	stringList: [String]
 	metaData: [Data]
}

The following queryData query will be generated in output schema from the above input schema.

query {
    queryData(filter: DataFilter): (Int, [Data])
}

Example GraphQL count query:

query {
    queryData(filter: \*Some Filter*\) {
        count
        name
        stringList
    }
}

The above GraphQL query would be rewritten to the following DQL query.

query {
    queryData(func: type(Data)) @filter(/* rewritten filter condtition */)) {
        count: count(uid)
        name
        stringList
    }
}

@amaster507, I will like to hear your comments about this and whether this solves some of the concerns mentioned by you.

1 Like

I think this really simplifies a lot. However in regards to:

I donā€™t think there has been much decisions made about how breaking changes will be handled and in this early fragile stage, it is probably best to stay away from breaking changes.

Would it be possible to put the count inside of the generated type itself?

Instead of:

type Data {
  id: ID!
  name: String!
  intList: [Int]
  stringList: [String]
   metaData: [Data]
}
query {
  queryData(filter: DataFilter): (Int, [Data]) 
  # I think the correct annotation would be {Int, [Data]}
}

It could easily be:

type Data {
  count: Int # a generated field with a new reserved keyword
  id: ID!
  name: String!
  intList: [Int]
  stringList: [String]
  metaData: [Data]
}
query {
    queryData(filter: DataFilter): [Data]
}

Sorry, bad terminology, I did not mean the 1st directive but rather the first: Int filter. If I limit my results to the first page of 10, I would want the count for all and not just the first page, because I already know that is <=10.

3 Likes

That would not really work or if it works then it would be similar to a DQL output which I believe is a bit unintuitive where the first object returns the count and next ones the actual data. As an example, the result of the DQL query

{
  directors(func: gt(count(director.film), 5)) {
    totalDirectors : count(uid)
    name@en
  }
}

is

{
  "data": {
    "directors": [
      {
        "totalDirectors": 7712
      },
      {
        "name@en": "Buster Keaton"
      },
      {
        "name@en": "Andrei Tarkovsky"
      }
...

I believe we should not go with this design.

In the design that @rajas is suggesting, the schema would be like below. It is a breaking change yes but has the pros that mentioned.

type QueryDataResponse {
  count: Int64
  data: [Data]
}

query {
  queryData(filter: DataFilter): QueryDataResponse
}

That is right but I donā€™t think we should disregard the first argument if count is given. That is also incompatible with how DQL works. If the user just wanted the complete count, they can always do a separate query without the first argument.

2 Likes

Another Possible Alternative to count queries

The approach of having a count variable inside queryData queries has the following disadvantages:

  1. It is a breaking change and will change queryData behaviour drastically.
  2. It also brings in other restrictions of not having any fields with the prefix of count , as count<FieldName> will be used to denote the count of predicate edges for <FieldName> field.

To avoid these two issues, as discussed with @pawan yesterday, we propose another approach to count queries.

Count at root

We can have another query of the name aggregate<type name>. This query will include fields for count and any other aggregation fields like sum or avg. For our current use-case, the updated schema would look like

input DataAggregateResult {
  count: Int
  ... 
  ...
  // Other fields for doing things like avg, sum, min, max etc.
}

query {
    aggregateData(filter: DataFilter): DataAggregateResult
}

Then the following GraphQL query

query {
  aggregateData(filter: {...}) {
    count
  }
}

would be rewritten to the following DQL query.

query {
    aggregateData(func: type(Data)) @filter(/* rewritten filter condtition */)) {
        count: count(uid)
    }
}

All query aggregation fields like sum, avg to be added later would be a part of return type of aggregateData query.

Count for a child

Along with the aggregateData query, to handle count of predicate edges and other aggregation fields, we propose having a <predicate_name>_aggregate field inside queryData query. This will solve the problem of having too many count<predicate_name> type fields inside the type.

For the following GraphQL schema

type Data {
  name: String!
  metaData: [Metadata]
}

Example:

query {
    queryData {
        name
        metaData_aggregate {
            count
        }
    }
}

The above query would be rewritten to DQL as follows.

query {
    queryData(func: type(Data))  {
        Data.name
        count(Data.metaData)   /* returns number of items in metaData list*/
    }
}

Note that the count(Data.metaData) field would be made a part of metaData_aggregate field while returning GraphQL response.

Other aggregation fields like min, max, sum could then be added later on to metaData_aggregate.
This approach is similar to how Aggregation queries are handled in other GraphQL providers.

@michaelcompton, I will like to hear your comments about this and any concerns you may have about the proposed solution.

2 Likes

Yeah, I think, in general we probably should have wrapped our queries in a response type (like we ā€˜wiselyā€™ did for mutations), that would allow not having to have the aggregateData query. Doesnā€™t really hurt anything, just means a bit of interface bloat.

This looks like a decent approach to me.

input DataAggregateResult {
  count: Int
  ...  // Other fields for doing things like avg, sum, min, max etc.
}

query {
    aggregateData(filter: DataFilter): DataAggregateResult
}

for those ā€œOther fields for doing things like avg, sum, min, max etc.ā€, weā€™ll have to do something like either have a nested sum { ... } that contains all the fields you can sum, or do sum_fieldName to get all the alternatives. Same for count for a child. Nested seems more in keeping with what weā€™ve done for filters, etc.

The other way to think about it might be with some sort of @custom that allowed to add dql snippets that would be compiled into the query. Thatā€™s kinda attractive cause itā€™s completely generic and could support math, but would, I expect, be both harder to implement and harder to use for non dql experts.

Iā€™l also just add my usual refrain that I see GraphQL as about an appā€™s data interface and not really about a query language, so Iā€™d rather have a way to say ā€˜hey, add a count for this bit hereā€™, rather than always compiling in all the possibilities everywhere.

3 Likes

Extending Count to Aggregate queries

As mentioned in the above posts that the aggregateData query would later on be extended to other aggregate functions like avg, min, max, sum , here are some examples of GraphQL to DQL queries on how that will work.

sum, avg functions would be allowed on field which are of the type Int or Float . sum would have the same return type as the value on which it operates. avg would always be of the type Float to cater to cases in which the sum of values is not divisible by number of predicates.

min , max functions would operate on fields which are of the type Int , Float, String, Datetime and would have the same return type as the field on which it operates.

Aggregate queries at Root level

For the following GraphQL schema

type Data {
  name: String!
  metaData: [Metadata]
}

The following DataAggregateResult input type will be generated

input DataAggregateResult {
  count: Int
  min_name: String
  max_name: String
}

The following GraphQL query

query {
    aggregateData(filter: DataFilter) {
        min_name
    }
}

would translate to the following DQL query

{
  var(func: type(Data)) @filter(/* rewritten filter condition */) {
    a as name
  }

  me() {
    min(val(a))
  }
}

Note that as aggregate functions and count are treated differently in DQL queries. GraphQL queries with both count and aggregate functions would be rewritten to multiple DQL queries and combined together before returning.

Aggregate queries at other levels

query {
    queryData {
        name
        metaData_aggregate {
            min_metadata_field  // metadata_field is some field in type metadata of type string
        }
    }
}

The above GraphQL query would translate to the following DQL query

{
  data as var(func: type(Data))  {
    metaData {
        field as metadata_field
    }
    minField as min(val(field))
  }

  me(func: uid(data)) {
    name
    val(minField)
  }
}

A Note about count queries

It was mentioned in the first post that scalar fields like intList: [Int] would also support count operations. But, with the new model, it wonā€™t be possible to support this as the AggregateData input field would only have a single count variable and all other count functions of fields like metaData will take place at a different level (inside metaData_aggregate field).

As counting of such scalar arrays is supported in DQL, one would still be able to get count of scalar arrays using custom DQL.

1 Like

Count Queries for GraphQL have been pushed into master today. Count Queries will be a part of 20.11 release. Note that work on other aggregate queries like average, maximum, minimum is still under progress.

Related PRs:

  1. Feat(GraphQL): Add count queries feature at non-root levels by vmrajas Ā· Pull Request #6834 Ā· dgraph-io/dgraph Ā· GitHub
  2. Feat(GraphQL): Add count query feature at root to GraphQL by vmrajas Ā· Pull Request #6786 Ā· dgraph-io/dgraph Ā· GitHub
2 Likes

Thatā€™s great news!

However Iā€™m missing an __aggregate field like in the following example:

queryPosts {
   __aggregate {
      count
   }
}

With the current implementation I can only do:

getUser(id: $userId) {
   postsAggregate {
      count
   }
}

which is problematic because it makes counting filtered queries hard. E.g. counting only posts with certain properties would include use of @cascade and forces me to return unwanted data (e.g. id of posts).

getUser(id: $userId) @cascade {
   posts(filter: {
      likes: {gt: 10}
   }) {
      id
   }
   postsAggregate {
      count
   }
}

After all, the goal of dedicated count queries is speed. Otherwise I could just query all IDs by myself and count on client-side.

2 Likes

Count Queries have two types, counting at root and counting at other levels. If you want to count posts with certain features, you could use count(aggregate) queries at root.
In your case, it will look like this:

aggregatePosts(filter:/*Apply some filter*/){
  count
}

This will report count of all Posts satisfying the given filter.

2 Likes

Ah great, I overlooked this query! Thank you very much! :slight_smile:

Will @auth permissions apply to these queries?

Yes, they do.

1 Like

Wonderful!

Now let me filter for child-properties and Iā€™m beyond satisfied! :slight_smile: Is there an Issue for child-property filtering? I canā€™t seem to find one.

Filtering for child properties on Count Queries is also possible.

Example:

getUser(id: $userId) {
   postsAggregate(filter: {
      likes: {gt: 10}
   }) {
      count
   }
}

This will give the count of posts which have likes greater than 10 for the User .

3 Likes

Thatā€™s awesome.

But I canā€™t count Posts that have childs that have certain properties, correct?
Like:

getUser(id: $userId) {
   postsAggregate(filter: {
      statistics: {
         likes: {gt: 10}
      }
   }) {
      count
   }
}

By the way, thank you for answering all this stuff while no documentation is available!

1 Like

Yes, That wonā€™t be possible.
Filtering on properties of child is not possible for other queries as well (not just count) .

Yes, I noticed that. Do we have an issue for this already? I couldnā€™t find one.

2 Likes