RFC: Nested Filters in GraphQL

jdgamble555 · September 3, 2021, 12:27pm

I believe this was the last official mention of this incredibly important feature:

But Dgraph has since had some turnover and employee changes. We don’t know where their priorities lie.

This feature is obviously big and a necessity for many.

Then we got this:

Which obviously didn’t happen, nor has there been any sign of GraphQL commits to the repository with the small exception of:

So, it does not look good at this point.

I would ask that someone from Dgraph responds to this, but it is pulling hair to get that at this point.

J

aman-bansal · September 4, 2021, 4:19pm

During our testing for v21.09 (we decided to call it v21.09 not v21.08), we found few issues that were blocking the release. It leads us to shift our timelines by few weeks. We recently released the RC-2 tag for v21.09. We are constantly working on making this release stable and we will release across the board soon.

As for the feature, I can’t commit any timeline for this. I am adding this into my to do list and will try to come up with something concrete soon.

jdgamble555 · September 4, 2021, 5:32pm

@aman-bansal - Thank you for giving us an update on this. We understand things take time to be done right, but the updates let us know Dgraph is continually making the product better.

@amaster507

amaster507:

I have 27K+ contacts. Each contact will normally have 1 address linking node, but could have an unlimited amount. Each address linking node links to an address node, each address node links to a state node. These state nodes are deduplicated to keep reverse lookups easier to all addresses in a state, but to continue on with this example. If I want to find contacts given a state in their address, this would query 27K + >27K + >27K + ~50. And this would return the ~540 in the state I am looking for. This costs querying 81,050+ nodes to get to this point. Using the inverse relationships in my work around I query 1 state + 540 addresses + 540 address linking nodes + 540 contacts. 1,621 nodes touched vs. 81,050. Just my opinion, but if this is the only way to do it right now, then maybe this should be held off for now. Better not done, then done with poor performance.

I have been thinking about this for a while, but if nested-filters were implemented (or when), that would be on you to create the inverse relationship. If the inverse relationship were implemented, even with @cascade, it would still fix your problem to be 81,050 → 1,621 nodes faster:

Type Contact {
  address: [Address]
  name: String
  ...
}
Type Address {
  state: State
  contacts: [Contact] @hasInverse(field: address)
  ...
}
Type State {
  code: ID!
  name: String @id
  addresses: [Address] @hasInverse(field: state)
  ...
}

Forward:

query {
  queryContact(filter: { address: { state: { code: 'TX' } } }) {
    id
    name
    address {
      state {
        name
        ...
       }
      ...
    }
}

Note: Your example is also two nested levels deep, which we hope will be an option as well.

VERSUS Reverse:

query {
  getState(code: 'TX') @cascade {
    addresses {
      contacts {
        name
        address {
          state {
            name
            ...
          }
        }
      }
    }
  }
}

Which seems like it would be the worse of the two, but actually faster (since the other version would use @cascade under the hood anyway.

I assume nested filters will be similar to Hasura’s nested objects.

So I don’t see being able to go the reverse direction under the hood since the types won’t be there with all of the reverse triples, but you should be able to get this faster query working now.

Correct me if I’m wrong?

J

amaster507 · September 4, 2021, 10:46pm

There is a lot to ingest here. And there is a lot of unknown dependencies. There is a big if-if hasInverse and reverse will be or even can be considered into the algorithm which really makes a big difference.

I had to get this working at the surface level of my app. So what I did is remove filters into one graphql query that returns only the ids of the base type I am looking for, and then another graphql query using those ids as a filter to get the actual fields I wanted to query.

So I am basically doing the nested var block filtering just all in the UI instead of the db. It is a LOT OF CODE that I would like to get rid of though. I had to do it in GraphQL and not DQL because I need to honor auth rules along the way.

jdgamble555 · September 4, 2021, 10:57pm

That makes sense. You’re also having to filter on the client side, which is never good.

What I was saying though, is the second query may be able to work for you the way DGraph GraphQL is now, in its current state, unless I am not thinking about this correctly.

(Obviously you may be using @id type instead of ID, and have more filters, but premise is the same).

J

amaster507 · September 5, 2021, 2:34am

Oh yes, I have a whole suite of filter builders. A user selects any direct child or mapped relationship at a predefined reverse path, selects the available filter types (eq, lt, gt, fulltext, regex, etc) and provides a value. We then generate multiple queries that we combine into a single query statement as effectively as we can to get blocks of ids that match those filters. And then we take those blocks of ids and join them together using the users logic and/or/not up to a somewhat limitless logical depth and on the client side get a single list of either positive or negative (not) ids. We then do the second query with this single list of ids.

Inside of this second query the UI provides the user with a dropdown to select fields that they want to use. These fields are either direct children, or you guessed it… any mapped deep relationship we preconfigured as an option to use for the user.

This process is as close as we could get right now to the GraphQL purpose of “get what you want how you want it” with the UI driving the queries instead of boxing the user into a small list of here is what you are limited to do.

In our MySQL backed app we have done all of this too but sometimes run into a limit of max tables joined to achieve the end results the user wants. And then of course we had to process the data client side to shape it into what was needed as everything from there gets returned in one massive flattened table.

DM me if you are interested and we can setup a time when I can give you a test drive of our UI.

jdgamble555 · September 5, 2021, 2:44am

I don’t think nested filters is going to help you much.

I will probably DM you when I get some time, sound interesting.

J

amaster507 · September 5, 2021, 3:09am

But I am basically building nested filters in the UI the long way around. If it was all in the query filter input object I could build the filter blocks solely based upon the schema instead of having to do schema + a ton of custom logic and algorithm.

What I want to do (simplified)

queryContact(filter: {
  and: [
    {
      lastName: { eq: "Master" }
      or: [
        { address: { state: { abbrev: { eq: "AR" } } } }
        { firstName: { eq: "Anthony" } }
      ]
    }
  ]
}) {
   ...selectFields
}

What I have to do

Build filter blocks

  block1: queryContact(filter: { lastName: { eq: "Master" } }) { id }
  block2: queryContact(filter: { firstName: { eq: "Anthony" } }) { id }
  block3: queryState(filter: { abbrev: { eq: "AR" } }) { hasAddresses { usedBy { id } } }

Take each block and flatten down to a list of ids, example block3 needs to flatten a multidemensional object down to a simple array
Take each block and join together using logic

block1 && (block2 || block3) = newList

Make final query

queryContact(filter: { id: $newList }) {
  ...selectFields
}

End goal is to have a 100% schema driven UI. When I change my schema, the UI adapts to the new schema offering more/less filterable fields and ways to filter.

jdgamble555 · September 5, 2021, 1:04pm

I’m wondering how much slower it is to have say 10,000 ids on the front end you have to search for (without heavy filtering), versus using a lot of @cascade directives internally.

Since you have an OR block, you could still simplify your use case to two code blocks instead of three now using:

query {
  queryState(filter: { abbrev: { eq: "AR" } }) @cascade {
    addresses {
      contacts(filter: { firstName { eq: "Anthony" } }) {
        firstName
        address {
          state {
            name
            ...
          }
        }
      }
    }
  }
}

Without knowing all the filters beforehand, this could get complicated to self-write the filters, but it theoretically could be automated to the limited amount of blocks.

Either way, pretty interesting you’re work-around. You could put all your queries in one transaction query with alias (which I assume you do).

Coolio.

J

Tyler_D · October 12, 2021, 8:23pm

Following up here. Any updates on if this is being worked on?

zmajew · October 30, 2021, 4:49am

As I see, @hasInverse is the obstacle here. In other words, the problem is an option to have no inverse edges in the schema. If @hasInverse is given as default then it would be easier to solve this filtering more efficiently.

charklewis · October 31, 2021, 1:08am

Checking in to see if there are any updates for this?

jdgamble555 · November 10, 2021, 12:58am

I think the simple fix for this performance problem is to require @hasInverse on the nested field type you want to filter. If there is no inverse index, you can’t filter it. Done.

That way when you guys are writing this feature, you just use the inverse node to not over fetch like @cascade would do.

Hopefully I am not over-simplifying it, but that seems logical to me!

J

loic · November 24, 2021, 1:44am

Hello All!
Joining the requesters: after only a couple of days playing with dgraph, I realized I really need this.

For one-to-one relations, I was wondering if having some kind of virtual fields like this:
Mongoose v6.0.13: Mongoose Tutorials: Mongoose Virtuals would help.

If we can create a virtual fields at the parent level, then we could just filter by them. No idea how to do that behind the curtains, tho, but I imagine it like this:

type Post {
  id: ID!
  author: Author! @hasInverse(field: "posts") 
  authorLastName: Herited @virtual(type: Author, field: lastName)
  title: String
}

type Author {
  id: ID!
  posts: [Post!]! @hasInverse(field: "author")
  lastName: String @search ...
}

so now we can just do like this:

query {
  queryPost(
     filter: { authorLastName: { eq: "Tesla" }, or: { id: ......}}
  ) {
    title
        }
> }

this would not work for one-to-many relations tho. Hope is not a completely stupid idea. Also virtuals being a thing in mongoose… i guess they have other use cases!

amaster507 · November 24, 2021, 2:39am

hmm, I sort of like this idea. Not the “virtual fields” (we tried that calling it Z-fields for filtering/sorting and lambdas to keep them in sync, didn’t work) because 1:1 relationships are not always 1:1 relationships under the hood. But the idea like this to only bring up certain nested fields instead of all of them. So instead of surfacing all of the search fields to every possible filter parent type, you could just bring up some fields to certain parents. But the problem is still how to implement it under the hood efficiently.

I would be interested in starting a topic around theorizing how to make hasInverse and reverse better and more efficient. But that should be a whole new topic to stay on topic here, but may be necessary prior to finalizing this feature efficiently.

An MVP of this feature would be nice to have even if it wasn’t efficient for the first round, but it should include a notice of such.

loic · November 24, 2021, 10:21am

I knew i was probably saying some nonsense as I’m new to dgraph and graphql , but I told my self “maybe my idea gives the experts some inspiration for other ideas”. Happy that it worked!

As a user i would not need to get all nested fields so something like:

type Post {
  id: ID!
  author: Author! @hasInverse(field: "posts") 
  title: String
}

type Author {
  id: ID!
  posts: [Post!]! @hasInverse(field: "author")
  lastName: String @nestedSearch (searchFrom: Post , by: [hash]))
}

//so we can do this or a similar thing:

query {
  queryPost(
     filter: { author:  { lastName: { eq: "Tesla" } }, or: { id: ......}}
  ) {
    title
        }
> }

To specify the fields we want to use for nested filters would make sense and be intuitive for me as a developer using dgraph

I would love to try an MVP of this feature. Also working backwards we could at least agree on a right structure for the query and the schema even before coding the MVP feature?

amaster507 · November 24, 2021, 2:40pm

I wouldn’t do the , by: [hash] because that is adjusting the index. So it would be two directives @search(...) and something else like @nestedSearch | @virtual | ?, I don’t really think a MVP for this needed this second option but would surface all filter search fields. This second option would be something to limit the scope of the GraphQL inputs, but realistically may broaden the scope of this specific feature.

jdgamble555 · January 1, 2022, 1:03am

Was just thinking, what if all GraphQL nested nodes used @reverse…? It would be costly as far as database size, but it would guarantee the fastest nested filter…

Just a thought…

PS… I bet @reverse is equally as fast/slow as @hasInverse since it is just an extra triple. I just like @reverse better because it is guaranteed to be there (I don’t trust my database while we still need access to DQL mutations)

J

amaster507 · January 1, 2022, 2:41am

I had that same thought somewhere, might have been in a different private thread, the theory is why does hasInverse exist in the first place? Why doesn’t everything just use the reverse directive and in the schema you would have to specify which is the forward and then you would have to specify which is the reverse. I wouldn’t see any thing different that hasInverse provides that reverse can’t do. Wherever I had this thought it was around the idea why can’t I use @dgraph to map an edge to a reverse edge like @dgraph(pred: "~myEdge") but that of course does not work right now.

iyinoluwaayoola · March 31, 2024, 11:48am

Hoping to revive this thread!

Topic		Replies	Views
Proposal Nested Object Filters for GraphQL rewritten as var blocks in DQL GraphQL kind:feature	5	1800	September 12, 2021
Filter by child node predicate in GraphQL GraphQL dgraph	1	757	October 29, 2021
Search and Filtering - Graphql Documentation	2	952	March 15, 2021
GraphQL: Connected filter on non-scalar list element fields GraphQL	10	1609	April 21, 2021
Query with filter GraphQL	2	744	November 25, 2021

RFC: Nested Filters in GraphQL

Related Topics