RFC: Nested Filters in GraphQL

I believe this was the last official mention of this incredibly important feature:

But Dgraph has since had some turnover and employee changes. We don’t know where their priorities lie.

This feature is obviously big and a necessity for many.

Then we got this:

Which obviously didn’t happen, nor has there been any sign of GraphQL commits to the repository with the small exception of:

So, it does not look good at this point.

I would ask that someone from Dgraph responds to this, but it is pulling hair to get that at this point.

J

3 Likes

During our testing for v21.09 (we decided to call it v21.09 not v21.08), we found few issues that were blocking the release. It leads us to shift our timelines by few weeks. We recently released the RC-2 tag for v21.09. We are constantly working on making this release stable and we will release across the board soon.

As for the feature, I can’t commit any timeline for this. I am adding this into my to do list and will try to come up with something concrete soon.

4 Likes

@aman-bansal - Thank you for giving us an update on this. We understand things take time to be done right, but the updates let us know Dgraph is continually making the product better.


@amaster507

I have been thinking about this for a while, but if nested-filters were implemented (or when), that would be on you to create the inverse relationship. If the inverse relationship were implemented, even with @cascade, it would still fix your problem to be 81,050 → 1,621 nodes faster:

Type Contact {
  address: [Address]
  name: String
  ...
}
Type Address {
  state: State
  contacts: [Contact] @hasInverse(field: address)
  ...
}
Type State {
  code: ID!
  name: String @id
  addresses: [Address] @hasInverse(field: state)
  ...
}

Forward:

query {
  queryContact(filter: { address: { state: { code: 'TX' } } }) {
    id
    name
    address {
      state {
        name
        ...
       }
      ...
    }
}

Note: Your example is also two nested levels deep, which we hope will be an option as well.

VERSUS Reverse:

query {
  getState(code: 'TX') @cascade {
    addresses {
      contacts {
        name
        address {
          state {
            name
            ...
          }
        }
      }
    }
  }
}

Which seems like it would be the worse of the two, but actually faster (since the other version would use @cascade under the hood anyway.

I assume nested filters will be similar to Hasura’s nested objects.

So I don’t see being able to go the reverse direction under the hood since the types won’t be there with all of the reverse triples, but you should be able to get this faster query working now.

Correct me if I’m wrong?

J

There is a lot to ingest here. And there is a lot of unknown dependencies. There is a big if-if hasInverse and reverse will be or even can be considered into the algorithm which really makes a big difference.

I had to get this working at the surface level of my app. So what I did is remove filters into one graphql query that returns only the ids of the base type I am looking for, and then another graphql query using those ids as a filter to get the actual fields I wanted to query.

So I am basically doing the nested var block filtering just all in the UI instead of the db. It is a LOT OF CODE that I would like to get rid of though. I had to do it in GraphQL and not DQL because I need to honor auth rules along the way.

That makes sense. You’re also having to filter on the client side, which is never good.

What I was saying though, is the second query may be able to work for you the way DGraph GraphQL is now, in its current state, unless I am not thinking about this correctly.

(Obviously you may be using @id type instead of ID, and have more filters, but premise is the same).

J

Oh yes, I have a whole suite of filter builders. A user selects any direct child or mapped relationship at a predefined reverse path, selects the available filter types (eq, lt, gt, fulltext, regex, etc) and provides a value. We then generate multiple queries that we combine into a single query statement as effectively as we can to get blocks of ids that match those filters. And then we take those blocks of ids and join them together using the users logic and/or/not up to a somewhat limitless logical depth and on the client side get a single list of either positive or negative (not) ids. We then do the second query with this single list of ids.

Inside of this second query the UI provides the user with a dropdown to select fields that they want to use. These fields are either direct children, or you guessed it… any mapped deep relationship we preconfigured as an option to use for the user.

This process is as close as we could get right now to the GraphQL purpose of “get what you want how you want it” with the UI driving the queries instead of boxing the user into a small list of here is what you are limited to do.

In our MySQL backed app we have done all of this too but sometimes run into a limit of max tables joined to achieve the end results the user wants. And then of course we had to process the data client side to shape it into what was needed as everything from there gets returned in one massive flattened table.

DM me if you are interested and we can setup a time when I can give you a test drive of our UI.

I don’t think nested filters is going to help you much.

:sweat: :man_facepalming:

I will probably DM you when I get some time, sound interesting.

J

But I am basically building nested filters in the UI the long way around. If it was all in the query filter input object I could build the filter blocks solely based upon the schema instead of having to do schema + a ton of custom logic and algorithm.

What I want to do (simplified)

queryContact(filter: {
  and: [
    {
      lastName: { eq: "Master" }
      or: [
        { address: { state: { abbrev: { eq: "AR" } } } }
        { firstName: { eq: "Anthony" } }
      ]
    }
  ]
}) {
   ...selectFields
}

What I have to do

  1. Build filter blocks
  block1: queryContact(filter: { lastName: { eq: "Master" } }) { id }
  block2: queryContact(filter: { firstName: { eq: "Anthony" } ​}) { id }
  block3: queryState(filter: { abbrev: { eq: "AR" } }) { hasAddresses { usedBy { id } } }
  1. Take each block and flatten down to a list of ids, example block3 needs to flatten a multidemensional object down to a simple array
  2. Take each block and join together using logic
block1 && (block2 || block3) = newList
  1. Make final query
queryContact(filter: { id: $newList }) {
  ...selectFields
}

End goal is to have a 100% schema driven UI. When I change my schema, the UI adapts to the new schema offering more/less filterable fields and ways to filter.

I’m wondering how much slower it is to have say 10,000 ids on the front end you have to search for (without heavy filtering), versus using a lot of @cascade directives internally.


Since you have an OR block, you could still simplify your use case to two code blocks instead of three now using:

query {
  queryState(filter: { abbrev: { eq: "AR" } }) @cascade {
    addresses {
      contacts(filter: { firstName { eq: "Anthony" } }) {
        firstName
        address {
          state {
            name
            ...
          }
        }
      }
    }
  }
}

Without knowing all the filters beforehand, this could get complicated to self-write the filters, but it theoretically could be automated to the limited amount of blocks.

Either way, pretty interesting you’re work-around. You could put all your queries in one transaction query with alias (which I assume you do).

Coolio.

J

1 Like

Following up here. Any updates on if this is being worked on?

5 Likes

As I see, @hasInverse is the obstacle here. In other words, the problem is an option to have no inverse edges in the schema. If @hasInverse is given as default then it would be easier to solve this filtering more efficiently.

Checking in to see if there are any updates for this?

4 Likes

I think the simple fix for this performance problem is to require @hasInverse on the nested field type you want to filter. If there is no inverse index, you can’t filter it. Done.

That way when you guys are writing this feature, you just use the inverse node to not over fetch like @cascade would do.

Hopefully I am not over-simplifying it, but that seems logical to me!

J

3 Likes

Hello All!
Joining the requesters: after only a couple of days playing with dgraph, I realized I really need this.

For one-to-one relations, I was wondering if having some kind of virtual fields like this:
Mongoose v6.0.13: Mongoose Tutorials: Mongoose Virtuals would help.

If we can create a virtual fields at the parent level, then we could just filter by them. No idea how to do that behind the curtains, tho, but I imagine it like this:

type Post {
  id: ID!
  author: Author! @hasInverse(field: "posts") 
  authorLastName: Herited @virtual(type: Author, field: lastName)
  title: String
}

type Author {
  id: ID!
  posts: [Post!]! @hasInverse(field: "author")
  lastName: String @search ...
}

so now we can just do like this:

query {
  queryPost(
     filter: { authorLastName: { eq: "Tesla" }, or: { id: ......}}
  ) {
    title
        }
> }

this would not work for one-to-many relations tho. Hope is not a completely stupid idea. Also virtuals being a thing in mongoose… i guess they have other use cases!

hmm, I sort of like this idea. Not the “virtual fields” (we tried that calling it Z-fields for filtering/sorting and lambdas to keep them in sync, didn’t work) because 1:1 relationships are not always 1:1 relationships under the hood. But the idea like this to only bring up certain nested fields instead of all of them. So instead of surfacing all of the search fields to every possible filter parent type, you could just bring up some fields to certain parents. But the problem is still how to implement it under the hood efficiently.

I would be interested in starting a topic around theorizing how to make hasInverse and reverse better and more efficient. But that should be a whole new topic to stay on topic here, but may be necessary prior to finalizing this feature efficiently.

An MVP of this feature would be nice to have even if it wasn’t efficient for the first round, but it should include a notice of such.

I knew i was probably saying some nonsense as I’m new to dgraph and graphql :metal:, but I told my self “maybe my idea gives the experts some inspiration for other ideas”. Happy that it worked!

As a user i would not need to get all nested fields so something like:

type Post {
  id: ID!
  author: Author! @hasInverse(field: "posts") 
  title: String
}

type Author {
  id: ID!
  posts: [Post!]! @hasInverse(field: "author")
  lastName: String @nestedSearch (searchFrom: Post , by: [hash]))
}

//so we can do this or a similar thing:

query {
  queryPost(
     filter: { author:  { lastName: { eq: "Tesla" } }, or: { id: ......}}
  ) {
    title
        }
> }

To specify the fields we want to use for nested filters would make sense and be intuitive for me as a developer using dgraph

I would love to try an MVP of this feature. Also working backwards we could at least agree on a right structure for the query and the schema even before coding the MVP feature?

I wouldn’t do the , by: [hash] because that is adjusting the index. So it would be two directives @search(...) and something else like @nestedSearch | @virtual | ?, I don’t really think a MVP for this needed this second option but would surface all filter search fields. This second option would be something to limit the scope of the GraphQL inputs, but realistically may broaden the scope of this specific feature.

1 Like