RFC: Nested Filters in GraphQL

Motivation

There are many requests from the community to add nested filters in GraphQL and it’s also one of the feature which can add more flexibility and value in our GraphQL implementation.
Nested Filters like below are not supported in GraphQL .

query {
  queryAuthor(
    filter: { name: { eq: "Alice" }, or: { posts: { postID: ["0x1", "0x2"] } } }
  ) {
    name
    posts {
      title
    }
  }
}

Although in DQL this Query is possible using var blocks like below

query {
 post1 as var(func:type(author))@cascade {
           Author.posts : Author.posts @filter(uid(0x1, 0x2)){
                     uid
              }
         }

 queryAuthor(func: type(Author)) @filter((eq(author.name,"Alice") or uid(post1))) {
        Author.name : Author.name
        Author.posts : Author.posts {
          Post.title : Post.title
          dgraph.uid : uid
        }
        dgraph.uid : uid
      }
    }

Rewriting GraphQL queries with nested filters is a bit complex that require considering lots of cases which we are going to explore one by one in this RFC.

User Impact

This feature is requested many number of times and is applicable to different use cases. Adding this will greatly enhance the user experience.

Implementation

Currently we are allowing only scalar fields in the filter , but we are planning to also add the object type fields in it so that we can reference the nested fields.

For example , consider the below schema

type Post {
  id: ID!
  author: Author! @hasInverse(field: "posts")
}

type Author {
  id: ID!
  posts: [Post!]! @hasInverse(field: "author")
}

Currently, we generate below author and post filters for it.

input AuthorFilter {
	id: [ID!]
	has: [AuthorHasFilter]
	and: [AuthorFilter]
	or: [AuthorFilter]
	not: AuthorFilter
}
input PostFilter {
	id: [ID!]
	has: [PostHasFilter]
	and: [PostFilter]
	or: [PostFilter]
	not: PostFilter
}

Now, we will add posts: PostFilter in input AuthorFilter , so that we can accesss fields of post in the AuthorFilter.

input AuthorFilter {
	id: [ID!]
	posts: PostFilter
	has: [AuthorHasFilter]
	and: [AuthorFilter]
	or: [AuthorFilter]
	not: AuthorFilter
}
input PostFilter {
	id: [ID!]
	author: AuthorFilter
	has: [PostHasFilter]
	and: [PostFilter]
	or: [PostFilter]
	not: PostFilter
}

Query rewriting for the different cases

Here we will explore different cases of nested filters in GraphQL and their corresponding DQL query.
We will use below schema in the examples

type Post {
    id: ID!
    title: String! @search
    text: String
    comments: [comment]
    author: Author!
}

type Author {
    id: ID!
    name: String! @search(by:["exact"])
    posts: [Post!]
    friends: [Author]
}

type Comment {
    id: ID!
    type: String @search (by: ["hash"])
    likes:Int @search
}

Current behavior:

Currently, we can filter a nested object field, but can’t filter a parent using the nested object field. In some of the use cases, where only AND operator is required between parent and nested filter, We can have a nested filter with cascade. For example,

Query: Get all posts that have the title “GraphQL” AND at least one comment of type thumbs up AND likes greater than 5.

GraphQL Query:

query{
     queryPost(filter:{title:{eq:"GraphQL"}) @cascade{
         id
         title
         comments (filter:{type:{anyofterms:"thumbs up"},and:{likes:{gt:5}}}) {
             id
             type
             likes
         }
    }
   }

@cascade here will enforce the AND condition between nested and parent filter.
Corresponding DQL query will be:

 query {
  queryPost(func: type(Post)) @filter(eq(Post.title, "GraphQL")) @cascade {
    Post.id : uid
    Post.title : Post.title
    Post.comments : Post.comments @filter((gt(Comment.likes, 5) AND anyofterms(Comment.type, "thumbs up"))) {
      Comment.id : uid
      Comment.type : Comment.type
      Comment.likes : Comment.likes
    }
  }
}

But when we need OR condition between the parent and nested filter, we can’t query using the above method.
For example, the below query is not possible in GraphQL using cascade, because it can only do AND between parent and nested filter, not OR.

Query: Get all posts that have the title “GraphQL” OR (at least one comment of type thumbs up AND likes greater than 5).

Rewriting

Now we will explore different queries with nested filters in GraphQL and their corresponding DQL queries. In DQL, we write multiple queries and link them using DQL variables. This feature is not available in GraphQL, but while rewriting GraphQL queries to DQL we will make use of it.

Although there are multiple ways to write a single GraphQL query with nested filters to DQL, we will as of now go with the simplest approach and then discuss optimizations later in this RFC.

Basically, we are going to generate a var block for every nested object in the filter and then filter the main query result based on the query in the var block. We can have different var blocks corresponding to different nested objects, which we will combine in the final query using connectives in the order given in the GraphQL query.

1. Two-level filter: OR

Query: Query authors and their posts such that either author is “Alice” or post is about “Dgraph”

GraphQL Query :

query {
  queryAuthor(
    filter: {
      name: { eq: "Alice" },
      or: { posts: { title: { eq: "Dgraph" } } }
    }
  ) {
    name
    posts {
      title
      text
    }
  }
}

Here, we are generating a separate query for the 2nd level field posts in the filter and using the result of that in the main query queryAuthor.

DQL Query:

query {  
   post1 as var(func:type(Author)) @cascade {
      Author.posts : Author.posts @filter(eq(Post.title, "Dgraph")){
         uid
      }
   }

   queryAuthor(func: type(Author)) @filter(eq(Author.name,"Alice") or uid(post1)){
        Author.name : Author.name
        Author.posts  : Author.posts {
        Post.title : Post.title
        Post.text : Post.text 
          dgraph.uid : uid
        }
        dgraph.uid : uid
      }
    }

2.Two-level filter: AND

Query: Query posts of “Alice” if she has at least one post about “Dgraph”`

query {
  queryAuthor(
    filter: {
      name: { eq: "Alice" },
      and: { posts: { title: { eq: "Dgraph" } } }
    }
  ) {
    name
    posts {
      title
      text
    }
  }
}

DQL Query:

query {  
     post1 as var(func:type(Author)) @cascade {
        Author.posts : Author.posts @filter(eq(Post.title, "Dgraph")){
           uid
        }
     }

     queryAuthor(func: type(Author)) @filter(eq(Author.name,"Alice") and uid(post1)){
        Author.name : Author.name
        Author.posts  : Author.posts {
        Post.title : Post.title
        Post.text : Post.text 
          dgraph.uid : uid
        }
        dgraph.uid : uid
      }
    }

This query can also be written in GraphQL without a nested filter at the parent as discussed in the previous section.
And similarly in DQL also we can write this without a separate var block like below:

query {  
     queryAuthor(func: type(Author)) @filter(eq(Author.name,"Alice")) @cascade {
        Author.name : Author.name
        Author.posts  : Author.posts @filter(eq(Post.title, "Dgraph")){
        Post.title : Post.title
        Post.text : Post.text 
          dgraph.uid : uid
        }
        dgraph.uid : uid
      }
    }

But, note that if you weren’t querying the posts field in the GraphQL query, then the above-specialized query can’t be used, and the generic way of rewriting the filter using a separate var block is the only way.

3.Two-level filter: NOT

Query: Query authors such that their name is “Alice” and they don’t have any post with the title as “Dgraph”
GraphQL Query:

query {
  queryAuthor(
    filter: {
      name: { eq: "Alice" }
      not: { posts: { title: { eq: "Dgraph" } } }
    }
  ) {
    name
  }
}

DQL Query:

query {  
  post1 as var(func:type(Author)) @cascade {
    Author.posts : Author.posts @filter(eq(Post.title, "Dgraph")) {
      uid
    }
  }

  queryAuthor(func: type(Author)) @filter(eq(Author.name,"Alice") AND NOT(uid(post1))){
    Author.name : Author.name
      dgraph.uid : uid
    }
  }

4.Two Level Filter: OR, AND

GraphQL query:

query {
  queryAuthor(
    filter: {
      or: [
        { friends: { name: { eq: "Bob" } } },
        {
          and: [
            { name: { eq: "Alice" } },
            { posts: { title: { eq: "Dgraph" }, text: { eq: "Intro to DQL" } } }
          ]
        }
      ]
    }
  ) {
    name
  }
}

DQL query:

query {  
  post1 as var(func:type(Author)) @cascade {
     Author.posts : Author.posts @filter((eq(Post.title, "Dgraph")) and (eq(Post.text,"Intro to DQL"))){
                     uid
              }
         }

  friends1 as var(func:type(Author)) @cascade {
     Author.friends : Author.friends @filter((eq(Author.name, "Bob"))){
                     uid
              }
         }

 queryAuthor(func: type(Author)) @filter((uid(friends1) OR (eq(Author.name,"Alice") AND uid(post1)))){
        Author.name : Author.name
        dgraph.uid : uid
      }
    }

5. Three Level Filter: OR, AND, OR

GraphQL query:

query {
  queryAuthor(
    filter: {
      or: [
        { friends: { name: { eq: "Bob" } } },
        {
          and: [
            { name: { eq: "Alice" } },
            {
              posts: { 
                title: { eq: "Dgraph" },
                or: { comments: { type: { eq: "excellent" }, likes: { gt: 5 } } }
              }
            }
          ]
        }
      ]
    }
  ) {
    name
  }
}


DQL query:

query {
   comment1 as var(func:type(Post)) @cascade {
      Post.comment : Post.comment @filter( eq(comment.type, "excellent") AND gt(comment.likes,5) ) {
         uid
      }
   }

   post1 as var(func:type(Author)) @cascade {
      Author.posts : Author.posts @filter(eq(Post.title, "Dgraph") OR uid(comment1)) {
         uid
      }
   }

   friends1 as var(func:type(Author)) @cascade {
      Author.friends : Author.friends @filter((eq(Author.name, "Bob"))) {
         uid
      }
   }

   queryAuthor(func: type(Author)) @filter((uid(friends1) OR (eq(Author.name,"Alice") AND uid(post1)))) {
         Author.name : Author.name
         dgraph.uid : uid
      }
   }

Future Optimizations

  • If the schema uses @hasInverse or the @reverse DQL index, then the query rewriting can be optimized to start traversal from the reverse edge.

References

1 Like

You use cascade too much! These need optimized to not use cascade as it will kill performance right out of the box.

Yeah! Surely it will not be performant. But, that is the only way at present to guarantee correctness while allowing this feature. I see it as paying the cost of deep filters.

We can later figure out optimizations, if they are possible.

This concerns me, and I am glad that I have worked around this for the most part with multiple GraphQL queries on client side that I chain together.

I have 27K+ contacts. Each contact will normally have 1 address linking node, but could have an unlimited amount. Each address linking node links to an address node, each address node links to a state node. These state nodes are deduplicated to keep reverse lookups easier to all addresses in a state, but to continue on with this example. If I want to find contacts given a state in their address, this would query 27K + >27K + >27K + ~50. And this would return the ~540 in the state I am looking for. This costs querying 81,050+ nodes to get to this point. Using the inverse relationships in my work around I query 1 state + 540 addresses + 540 address linking nodes + 540 contacts. 1,621 nodes touched vs. 81,050. Just my opinion, but if this is the only way to do it right now, then maybe this should be held off for now. Better not done, then done with poor performance.

Yeah, this design will cost more in terms of performance as Abhimanyu already mention. We will be discussing other possible solutions internally with the team. One good solution is definitely to make use of the inverse edges and reduce the universe at the root as you already mentioned.

But in that, we need to inverse edges in the schema, and rewriting that seems much difficult than this approach. We will be exploring it. Currently, there are many requests for this feature and there are many use cases that otherwise are not possible with GraphQL. So to allow those use cases, I guess we can go with this approach but we will be discussing and exploring all the possible optimizations before implementing it.

2 Likes

Thanks so much for this.

I don’t really have an issue with using cascade and relying on developers to have a stronger conception of the shape of their graph when defining queries in the near term. A lot of us really need this functionality. I do think intelligently minimizing the node universe by tracking node counts and measuring inverse edges is really important, but it seems to me like more of a feature upgrade than a different approach :man_shrugging:.

I also think it’s important to provide some functional filters on a set of connected nodes…things like contains, every, and count/length. For instance, being able to efficiently filter for entities that have at least one connected node with property1 and at least one connected node with property2 is very core to my needs (and blocking at scale).

One question - for layered filter queries that can be handled by DQL today, does the logic stop traversal on ‘dead-end’ paths once a node fails a filter?