Defer Field Selection in Custom DQL Query

minhaj · March 19, 2021, 1:08pm

Motivation

Currently, when we define any custom DQL query in the GraphQL schema, we have to explicitly define the fields. This results in over-fetching the data from the Dgraph if the selection set of the GraphQL query is a subset of the DQL query. For example for the given schema:-

type Tweets {
	id: ID!
	text: String! @search(by: [fulltext])
	author: User
	timestamp: DateTime! @search
}
type User {
	screen_name: String! @id
	followers: Int @search
	tweets: [Tweets] @hasInverse(field: author)
}

type Query {
  queryTweetsSortedByAuthorFollowers(search: String!): [Tweets] @custom(dql: """
	query q($search: string) {
		var(func: type(Tweets)) @filter(anyoftext(Tweets.text, $search)) {
			Tweets.author {
				followers as User.followers
			}
			authorFollowerCount as sum(val(followers))
		}
		queryTweetsSortedByAuthorFollowers(func: uid(authorFollowerCount), orderdesc: val(authorFollowerCount)) {
			id: uid
			text: Tweets.text
			author: Tweets.author {
			    screen_name: User.screen_name
			    followers: User.followers
			}
			timestamp: Tweets.timestamp
		}
	}
	""")

The custom DQL query queryTweetsSortedByAuthFollowers contains id, text, author and timestamp in the selectionset.
Now Suppose if we want to do the below GraphQL query:-

query {
  queryTweetsSortedByAuthorFollowers{
    id
    text
    timestamp
  }
}

Note that in this query we didn’t query for Author but the DQL query will also fetch the Author from the Dgraph which should result in certain performance degradation.
We propose to change this behavior so the Dgraph only executes the query with the selection fields provided in the GraphQL query.

User Impact

It will optimize the custom DQL queries and will be more usable to the community.

Implementation

Currently, we have come up with two possible implementations, which we discuss here.

1- Parse custom DQL query and rewrite on basis of GraphQL selection set.

In this approach, we first parse the DQL query at the time of execution and then rewrite the new DQL query on the basis of the selection set of the GraphQL query. In the rewriting phase, we perform the following steps:-

Remove the predicates from the custom query if they are not present in the GraphQL selection set.
Include those fields of the GraphQL query which are not declared in the DQL Query.

2- Use _defer_ keyword and then pick the selection set from GraphQL.

In this case, we will introduce _defer_ keyword to be used in the selection set of the query instead of mentioning any predicate. For example, the above custom query will be like:-

queryTweetsSortedByAuthorFollowers(search: String!): [Tweets] @custom(dql: """
    query q($search: string) {
        var(func: type(Tweets)) @filter(anyoftext(Tweets.text, $search)) {
            Tweets.author {
                followers as User.followers
            }
            authorFollowerCount as sum(val(followers))
        }
        queryTweetsSortedByAuthorFollowers(func: uid(authorFollowerCount), orderdesc: val(authorFollowerCount)) {
            _defer_
        }
    }
    """)

And GraphQL does the rewriting based on the selection set of the GraphQL query.

Challenges

1- To construct the valid DQL query as there may be some unused variables in the DQL query after we remove some of the predicates from the query.

Any other approach to the implementation and comments are highly welcomed.

References

smkhalsa · March 19, 2021, 1:26pm

I’d go with the _defer_ approach. Rewriting the query to include / remove fields could be confusing for users.

amaster507 · March 19, 2021, 2:16pm

Can _defer_ be used with other field selection that should always be present to return the proper results?

Use case If we have @cascade( ... Tweets.author ... ) and then do not select the author field, the cascade would not have the results expected.

minhaj · March 22, 2021, 12:33pm

Hey @amaster507, Thanks for responding.
This could be implemented as follows:-
Users can mention predicates also along with _defer_ in the selection set of custom DQL queries.
Using _defer_ will allow the GraphQL to rewrite the DQL query and include the selection set present in the GraphQL query into the DQL query. So the final DQL query will have the fields from the GraphQL query and predicates already defined in the DQL query.
However, in this approach, we are assuming that the user has defined only those predicates in the DQL query along with _defer_ which are strictly needed for the correctness of the Query since none of the predicates will be dropped at the time of rewriting.
Suppose the custom query is as follows:-

query q($search: string) {
        auth as var(func: type(Author)) @filter(ge(Author.followers,10))
        var(func: type(Tweets)) @filter(anyoftext(Tweets.text, $search)) {
            Tweets.author {
                followers as User.followers
            }
            authorFollowerCount as sum(val(followers))
        }
        queryTweetsSortedByAuthorFollowers(func: uid(authorFollowerCount), orderdesc: val(authorFollowerCount)) @cascade([Tweet.timestamp]){
            timestamp: Tweet.timestamp
            author: Tweet.author @filter(uid(auth)){
                _defer_
            }
            _defer_
        }
    }
    """)

if the GraphQL query is:-

queryTweetsSortedByAuthorFollowers{
   id
   text
   author {
      screen_name
      followers
   }
}

The DQL query block will be modified to complete the selection set of Tweet.author and then queryTweetsSortedByAuthorFollowers. The final DQL query will be like this:-

 queryTweetsSortedByAuthorFollowers(func: uid(authorFollowerCount), orderdesc: val(authorFollowerCount)) @cascade([Tweet.timestamp]){
        timestamp: Tweet.timestamp
        author: Tweet.author @filter(uid(auth)){
              screen_name: User.screen_name
	          followers: User.followers
        }
        id: uid
        text: Tweets.text
    }

But if the GraphQL query doesn’t have author in its selection set then Tweet.author will not be dropped from the custom DQL query as dropping it will make the query to be invalid due to unused var auth and cleaning unused variables would make it unnecessarily complex. Then the query formed will be:-

queryTweetsSortedByAuthorFollowers(func: uid(authorFollowerCount), orderdesc: val(authorFollowerCount)) @cascade([Tweet.timestamp]){
        timestamp: Tweet.timestamp
        author: Tweet.author @filter(uid(auth)){
        }
        id: uid
        text: Tweets.text
    }

smkhalsa · March 28, 2021, 10:15pm

Any update on this?

abhimanyusinghgaur · March 31, 2021, 1:10pm

Hey @smkhalsa,

After the 21.03 release, we have a bit of holiday season going on right now and most of our engineering team is out. So, no progress has been made on this yet. Responses from our side will be slow till mid-April.

You can expect something by the end of April.

smkhalsa · July 19, 2021, 5:06pm

@abhimanyusinghgaur Can you please provide an update on this? Is it on the roadmap?

This feature is very important to my use case.

Topic		Replies	Views
Defer field selection to subquery when using custom DQL GraphQL kind:enhancement , status:accepted , area:graphql , ticket:created	8	1248	October 20, 2020
Improvements to Custom DQL feature GraphQL area:graphql	0	785	January 28, 2021
Custom DQL - Graphql Documentation	3	895	August 15, 2021
Supporting GraphQL+- queries in GraphQL Dev rfc	9	880	July 30, 2020
Passing Sorting Criteria to Custom DQL Dgraph kind:question , dgraph , dql	2	985	September 15, 2022