RFC: Nested Filters in GraphQL

Motivation

There are many requests from the community to add nested filters in GraphQL and it’s also one of the feature which can add more flexibility and value in our GraphQL implementation.
Nested Filters like below are not supported in GraphQL .

query {
  queryAuthor(
    filter: { name: { eq: "Alice" }, or: { posts: { postID: ["0x1", "0x2"] } } }
  ) {
    name
    posts {
      title
    }
  }
}

Although in DQL this Query is possible using var blocks like below

query {
 post1 as var(func:type(author))@cascade {
           Author.posts : Author.posts @filter(uid(0x1, 0x2)){
                     uid
              }
         }

 queryAuthor(func: type(Author)) @filter((eq(author.name,"Alice") or uid(post1))) {
        Author.name : Author.name
        Author.posts : Author.posts {
          Post.title : Post.title
          dgraph.uid : uid
        }
        dgraph.uid : uid
      }
    }

Rewriting GraphQL queries with nested filters is a bit complex that require considering lots of cases which we are going to explore one by one in this RFC.

User Impact

This feature is requested many number of times and is applicable to different use cases. Adding this will greatly enhance the user experience.

Implementation

Currently we are allowing only scalar fields in the filter , but we are planning to also add the object type fields in it so that we can reference the nested fields.

For example , consider the below schema

type Post {
  id: ID!
  author: Author! @hasInverse(field: "posts")
}

type Author {
  id: ID!
  posts: [Post!]! @hasInverse(field: "author")
}

Currently, we generate below author and post filters for it.

input AuthorFilter {
	id: [ID!]
	has: [AuthorHasFilter]
	and: [AuthorFilter]
	or: [AuthorFilter]
	not: AuthorFilter
}
input PostFilter {
	id: [ID!]
	has: [PostHasFilter]
	and: [PostFilter]
	or: [PostFilter]
	not: PostFilter
}

Now, we will add posts: PostFilter in input AuthorFilter , so that we can accesss fields of post in the AuthorFilter.

input AuthorFilter {
	id: [ID!]
	posts: PostFilter
	has: [AuthorHasFilter]
	and: [AuthorFilter]
	or: [AuthorFilter]
	not: AuthorFilter
}
input PostFilter {
	id: [ID!]
	author: AuthorFilter
	has: [PostHasFilter]
	and: [PostFilter]
	or: [PostFilter]
	not: PostFilter
}

Query rewriting for the different cases

Here we will explore different cases of nested filters in GraphQL and their corresponding DQL query.
We will use below schema in the examples

type Post {
    id: ID!
    title: String! @search
    text: String
    comments: [comment]
    author: Author!
}

type Author {
    id: ID!
    name: String! @search(by:["exact"])
    posts: [Post!]
    friends: [Author]
}

type Comment {
    id: ID!
    type: String @search (by: ["hash"])
    likes:Int @search
}

Current behavior:

Currently, we can filter a nested object field, but can’t filter a parent using the nested object field. In some of the use cases, where only AND operator is required between parent and nested filter, We can have a nested filter with cascade. For example,

Query: Get all posts that have the title “GraphQL” AND at least one comment of type thumbs up AND likes greater than 5.

GraphQL Query:

query{
     queryPost(filter:{title:{eq:"GraphQL"}) @cascade{
         id
         title
         comments (filter:{type:{anyofterms:"thumbs up"},and:{likes:{gt:5}}}) {
             id
             type
             likes
         }
    }
   }

@cascade here will enforce the AND condition between nested and parent filter.
Corresponding DQL query will be:

 query {
  queryPost(func: type(Post)) @filter(eq(Post.title, "GraphQL")) @cascade {
    Post.id : uid
    Post.title : Post.title
    Post.comments : Post.comments @filter((gt(Comment.likes, 5) AND anyofterms(Comment.type, "thumbs up"))) {
      Comment.id : uid
      Comment.type : Comment.type
      Comment.likes : Comment.likes
    }
  }
}

But when we need OR condition between the parent and nested filter, we can’t query using the above method.
For example, the below query is not possible in GraphQL using cascade, because it can only do AND between parent and nested filter, not OR.

Query: Get all posts that have the title “GraphQL” OR (at least one comment of type thumbs up AND likes greater than 5).

Rewriting

Now we will explore different queries with nested filters in GraphQL and their corresponding DQL queries. In DQL, we write multiple queries and link them using DQL variables. This feature is not available in GraphQL, but while rewriting GraphQL queries to DQL we will make use of it.

Although there are multiple ways to write a single GraphQL query with nested filters to DQL, we will as of now go with the simplest approach and then discuss optimizations later in this RFC.

Basically, we are going to generate a var block for every nested object in the filter and then filter the main query result based on the query in the var block. We can have different var blocks corresponding to different nested objects, which we will combine in the final query using connectives in the order given in the GraphQL query.

1. Two-level filter: OR

Query: Query authors and their posts such that either author is “Alice” or post is about “Dgraph”

GraphQL Query :

query {
  queryAuthor(
    filter: {
      name: { eq: "Alice" },
      or: { posts: { title: { eq: "Dgraph" } } }
    }
  ) {
    name
    posts {
      title
      text
    }
  }
}

Here, we are generating a separate query for the 2nd level field posts in the filter and using the result of that in the main query queryAuthor.

DQL Query:

query {  
   post1 as var(func:type(Author)) @cascade {
      Author.posts : Author.posts @filter(eq(Post.title, "Dgraph")){
         uid
      }
   }

   queryAuthor(func: type(Author)) @filter(eq(Author.name,"Alice") or uid(post1)){
        Author.name : Author.name
        Author.posts  : Author.posts {
        Post.title : Post.title
        Post.text : Post.text 
          dgraph.uid : uid
        }
        dgraph.uid : uid
      }
    }

2.Two-level filter: AND

Query: Query posts of “Alice” if she has at least one post about “Dgraph”`

query {
  queryAuthor(
    filter: {
      name: { eq: "Alice" },
      and: { posts: { title: { eq: "Dgraph" } } }
    }
  ) {
    name
    posts {
      title
      text
    }
  }
}

DQL Query:

query {  
     post1 as var(func:type(Author)) @cascade {
        Author.posts : Author.posts @filter(eq(Post.title, "Dgraph")){
           uid
        }
     }

     queryAuthor(func: type(Author)) @filter(eq(Author.name,"Alice") and uid(post1)){
        Author.name : Author.name
        Author.posts  : Author.posts {
        Post.title : Post.title
        Post.text : Post.text 
          dgraph.uid : uid
        }
        dgraph.uid : uid
      }
    }

This query can also be written in GraphQL without a nested filter at the parent as discussed in the previous section.
And similarly in DQL also we can write this without a separate var block like below:

query {  
     queryAuthor(func: type(Author)) @filter(eq(Author.name,"Alice")) @cascade {
        Author.name : Author.name
        Author.posts  : Author.posts @filter(eq(Post.title, "Dgraph")){
        Post.title : Post.title
        Post.text : Post.text 
          dgraph.uid : uid
        }
        dgraph.uid : uid
      }
    }

But, note that if you weren’t querying the posts field in the GraphQL query, then the above-specialized query can’t be used, and the generic way of rewriting the filter using a separate var block is the only way.

3.Two-level filter: NOT

Query: Query authors such that their name is “Alice” and they don’t have any post with the title as “Dgraph”
GraphQL Query:

query {
  queryAuthor(
    filter: {
      name: { eq: "Alice" }
      not: { posts: { title: { eq: "Dgraph" } } }
    }
  ) {
    name
  }
}

DQL Query:

query {  
  post1 as var(func:type(Author)) @cascade {
    Author.posts : Author.posts @filter(eq(Post.title, "Dgraph")) {
      uid
    }
  }

  queryAuthor(func: type(Author)) @filter(eq(Author.name,"Alice") AND NOT(uid(post1))){
    Author.name : Author.name
      dgraph.uid : uid
    }
  }

4.Two Level Filter: OR, AND

GraphQL query:

query {
  queryAuthor(
    filter: {
      or: [
        { friends: { name: { eq: "Bob" } } },
        {
          and: [
            { name: { eq: "Alice" } },
            { posts: { title: { eq: "Dgraph" }, text: { eq: "Intro to DQL" } } }
          ]
        }
      ]
    }
  ) {
    name
  }
}

DQL query:

query {  
  post1 as var(func:type(Author)) @cascade {
     Author.posts : Author.posts @filter((eq(Post.title, "Dgraph")) and (eq(Post.text,"Intro to DQL"))){
                     uid
              }
         }

  friends1 as var(func:type(Author)) @cascade {
     Author.friends : Author.friends @filter((eq(Author.name, "Bob"))){
                     uid
              }
         }

 queryAuthor(func: type(Author)) @filter((uid(friends1) OR (eq(Author.name,"Alice") AND uid(post1)))){
        Author.name : Author.name
        dgraph.uid : uid
      }
    }

5. Three Level Filter: OR, AND, OR

GraphQL query:

query {
  queryAuthor(
    filter: {
      or: [
        { friends: { name: { eq: "Bob" } } },
        {
          and: [
            { name: { eq: "Alice" } },
            {
              posts: { 
                title: { eq: "Dgraph" },
                or: { comments: { type: { eq: "excellent" }, likes: { gt: 5 } } }
              }
            }
          ]
        }
      ]
    }
  ) {
    name
  }
}


DQL query:

query {
   comment1 as var(func:type(Post)) @cascade {
      Post.comment : Post.comment @filter( eq(comment.type, "excellent") AND gt(comment.likes,5) ) {
         uid
      }
   }

   post1 as var(func:type(Author)) @cascade {
      Author.posts : Author.posts @filter(eq(Post.title, "Dgraph") OR uid(comment1)) {
         uid
      }
   }

   friends1 as var(func:type(Author)) @cascade {
      Author.friends : Author.friends @filter((eq(Author.name, "Bob"))) {
         uid
      }
   }

   queryAuthor(func: type(Author)) @filter((uid(friends1) OR (eq(Author.name,"Alice") AND uid(post1)))) {
         Author.name : Author.name
         dgraph.uid : uid
      }
   }

Future Optimizations

  • If the schema uses @hasInverse or the @reverse DQL index, then the query rewriting can be optimized to start traversal from the reverse edge.

References

7 Likes

You use cascade too much! These need optimized to not use cascade as it will kill performance right out of the box.

1 Like

Yeah! Surely it will not be performant. But, that is the only way at present to guarantee correctness while allowing this feature. I see it as paying the cost of deep filters.

We can later figure out optimizations, if they are possible.

This concerns me, and I am glad that I have worked around this for the most part with multiple GraphQL queries on client side that I chain together.

I have 27K+ contacts. Each contact will normally have 1 address linking node, but could have an unlimited amount. Each address linking node links to an address node, each address node links to a state node. These state nodes are deduplicated to keep reverse lookups easier to all addresses in a state, but to continue on with this example. If I want to find contacts given a state in their address, this would query 27K + >27K + >27K + ~50. And this would return the ~540 in the state I am looking for. This costs querying 81,050+ nodes to get to this point. Using the inverse relationships in my work around I query 1 state + 540 addresses + 540 address linking nodes + 540 contacts. 1,621 nodes touched vs. 81,050. Just my opinion, but if this is the only way to do it right now, then maybe this should be held off for now. Better not done, then done with poor performance.

Yeah, this design will cost more in terms of performance as Abhimanyu already mention. We will be discussing other possible solutions internally with the team. One good solution is definitely to make use of the inverse edges and reduce the universe at the root as you already mentioned.

But in that, we need to inverse edges in the schema, and rewriting that seems much difficult than this approach. We will be exploring it. Currently, there are many requests for this feature and there are many use cases that otherwise are not possible with GraphQL. So to allow those use cases, I guess we can go with this approach but we will be discussing and exploring all the possible optimizations before implementing it.

2 Likes

Thanks so much for this.

I don’t really have an issue with using cascade and relying on developers to have a stronger conception of the shape of their graph when defining queries in the near term. A lot of us really need this functionality. I do think intelligently minimizing the node universe by tracking node counts and measuring inverse edges is really important, but it seems to me like more of a feature upgrade than a different approach :man_shrugging:.

I also think it’s important to provide some functional filters on a set of connected nodes…things like contains, every, and count/length. For instance, being able to efficiently filter for entities that have at least one connected node with property1 and at least one connected node with property2 is very core to my needs (and blocking at scale).

One question - for layered filter queries that can be handled by DQL today, does the logic stop traversal on ‘dead-end’ paths once a node fails a filter?

2 Likes

Currently I end up combining multiple queries using @cascade as a kind of workaround. So e.g. to get all tasks that are either assigned to no user or to a specified user having either specified tags or no tags this is the query I’d end up with. Please let me know if this can be done differently:

query tasks($filter: AssignableFilter! = { has: name }, $users: UserFilter = { has: name }, $tags: TagFilter! = { has: name }) {
  assignedTasks: queryAssignable(filter: {and: [{ has: assignedTo }, {has: tags}, $filter]}) @cascade(fields: ["assignedTo", "tags"]) {
    id
    name
    assignedTo(filter: $users) {
      id
      name
    }
    tags(filter: $tags) {
      name
    }
    pastExecutions {
      user {
        id
        name
      }
      timestamp
    }
    effort
    ... on Chore {
      interval
    }
    ... on Task {
      due
    }
  }
  assignedTasksWithoutTags: queryAssignable(filter: {and: [{ has: assignedTo }, { not: { has: tags }}, $filter]}) @cascade(fields: ["assignedTo"]) {
    id
    name
    assignedTo(filter: $users) {
      id
      name
    }
    pastExecutions {
      user {
        id
        name
      }
      timestamp
    }
    effort
    ... on Chore {
      interval
    }
    ... on Task {
      due
    }
  }
  unassignedTasks: queryAssignable(filter: {and: [{not: {has: assignedTo}}, {has: tags}, $filter]}) @cascade(fields: ["tags"]) {
    id
    name
    tags(filter: $tags) {
      name
    }
    pastExecutions {
      user {
        id
        name
      }
      timestamp
    }
    effort
    ... on Chore {
      interval
    }
    ... on Task {
      due
    }
  }
  unassignedTasksWithoutTags: queryAssignable(filter: {not: {or: [{ has: assignedTo }, { has: tags }]}}) {
    id
    name
    pastExecutions {
      user {
        id
        name
      }
      timestamp
    }
    effort
    ... on Chore {
      interval
    }
    ... on Task {
      due
    }
  }
}
1 Like

I think this might be a deal-breaker for many. At least I think it will be for us.

The major use case for us is to make the front-end developers able to work separate from the back-end and get a flexible API with real-time capabilities that don’t require us to do any custom development for loading data e.g. for tables. Without proper nested filtering capabilities that will guaranteed come back to bite us since we have pretty many relations making a graph database great. The experience so far in the POC we are doing has been really great, I just took for granted that the filtering would be there since it seems like such a crucial feature.

Really hoping this will be added (or planned) before we have to make a decision as the product seems amazing except for this pretty big (IMO) shortcoming. For some reason it works in the @auth directive, if the filter on a child returns 0 rows it will filter out the parent, not sure why.

3 Likes

Hi @CosmicPangolin1, Thanks for your opinion on this. Currently, aggregate queries like count are not available in filters, once we have them then it will be easy to write the filters for connected nodes as you mentioned. Currently, that can be achieved by custom DQl though. see this
Filter by counts in GraphQL

I didn’t fully get what you mean by logic stop traversal on ‘dead-end’ paths once a node fails a filter?

But I guess you mean that if there are no nested nodes and we have a filter on them then what is the behaviour of it. So for example take below DQL query. The first query just went through all nodes which have dgraph.type Author and then select those Authors from them who have post title Dgraph. But here we also got the Author who doesn’t have any posts with Author. posts equal null. So here @cascade come for our rescue, it filters out those leaf nodes and gives us the author who have posts and whose title is Dgraph.

And then in 2nd query, we use those Authors and then filters their names. So in short for the nodes which don’t have nested node for them we filter them out using @cascade at root.

query {  
     post1 as var(func:type(Author)) @cascade {
        Author.posts : Author.posts @filter(eq(Post.title, "Dgraph")){
           uid
        }
     }
queryAuthor(func: type(Author)) @filter(eq(Author.name,"Alice") and uid(post1)){
        Author.name : Author.name
        Author.posts  : Author.posts {
        Post.title : Post.title
        Post.text : Post.text 
          dgraph.uid : uid
        }
        dgraph.uid : uid
      }
    }

i hope it answers, you doubt. Thanks.

Yeah, you can use the above workaround if it works for your use case. But I see some limitations of it

  1. Anything parameter inside the @cascade will work as AND connective. For example in your first query , we can’t do OR of assignedTo and Tags.

  2. We can only use original field names in cascade, we can’t use Aliases. If we want to use two or more filter on same nested field then it’s not possible unless Aliases are allowed inside cascade.
    see this GraphQL: Connected filter on non-scalar list element fields - #11 by pawan

Thanks, @sebwalle for highlighting the importance of nested filters. We will surely prioritize it and try to add it soon.

Can you give example schema and query, I want to see how it’s working with @auth.

3 Likes

Sure, below is the schema. Just started testing the directive and will add some more conditions, such as $ORDER_SUPER_USER etc, hence the or filter. Haven’t done the JWT parsing yet. But this correctly filters out any orders that doesn’t have fund_id 1 or 67 in the allocations block. The logic is that you can only see orders that are at least partially allocated to funds you have permission to.

type Order @withSubscription @auth(
    query: { rule: """
        query { 
            queryOrder { 
                id
				allocations(filter: { or:[ {fund_id: { in: [67,1] } }] } ) {
					id
				}
            } 
        }"""
    }
) {
	id: String! @id
	quantity: Float 
	instrument: Instrument
	order_type: String @search(by:[exact]) 
    allocations: [Allocation] @hasInverse(field:order)
}

type Instrument @withSubscription {
	id: String! @id
	name: String!
	instrument_type: String @search
}

type Allocation {
  id: String! @id
  quantity: Float
  fund_id: Int64 @search
  order: Order
}

this is a query

query {
  queryOrder {
    id
    quantity
    instrument {
      id
      name
    }
    allocations {
      id
      fund_id
      quantity
      order{
        id
      }
    }
  }
}

Thanks, @sebwalle, it will work. If you just want to filter the parent node based on query result because Auth rules are run in a mode that requires all fields in the rule to find a value in order to succeed as it’s mentioned in docs. https://dgraph.io/docs/graphql/authorization/directive/
We automatically add the @cascade directory at the root while converting graphql query to dql query.
Same effect you can get without auth rules

query {
  queryOrder @cascade {
    id
    quantity
    instrument {
      id
      name
    }
    allocations {
      id
      fund_id
      quantity
      order{
        id
      }
    }
  }
}

Auth rules are designed that way but they don’t get any other extra filtering functionality than what we currently have.

Fantastic news!! Thank you!

@abhimanyusinghgaur

I know you guys are working on this for 21.07. Is this possible with var insterad of @cascade as @amaster507 mentioned?

I have several uses for this a well (like many people), and the performance concerns me too.

Thanks,

J

Hello, is there a time frame on when this will be available in Dgraph Cloud?

2 Likes

@minhaj is anybody currently working on this or is this stalled?