RFC: Persistent queries and cached results

RFC: Persistent queries and cached results

Summary

Extend GraphQL layer on Dgraph to support persistent queries and cached results. When persistent queries are enabled then client only sends a hash of available queries to the server which has a list of known hashes and uses the related query. If cached results are enabled for a query then stored results are served if queried within time-to-live of the query being cached. Persistent Queries improve the performance and the security of application. Cached results improve the performance for read-heavy workloads.

Note: Caching in this RFC refers to caching at browser/CDN level. Caching at the database layer is out-of-scope of this RFC.

Motivation

Genesis of the discussion around supporting persisted queries was GraphQL in Space conference 2020. Facebook, Medium, etc. use persisted queries for performant and security reasons. We want to provide this functionality out of the box.

Persisted queries can be paired with GET requests enabling the caching. Cached results can be used to serve read-heavy workloads with complex queries to improve performance. With appropriate headers the caching can be done at browser/CDNs.

Guide-level explanation

Persistent Queries

We will support persistent queries for Dgraph in the following waterfall logic:

  1. If extensions key is not provided in the request we process the request as usual
  2. If extensions key is present in the GET request
    a. If both query and SHA is provided we verify the hash and do a lookup/overwrite and process the query.
    Example: curl -g 'http://localhost:8080/graphql/?query={sample_query}&extensions={"persistedQuery":{"sha256Hash":"hash-key-xxx"}}'
    b. If only SHA is provided we do a look-up and process the query if found else return an error.
    Example: curl -g 'http://localhost:8080/graphql/?extensions={"persistedQuery":{"sha256Hash":"hash-key-xxx"}}'
    c. If only query is provided we generate a SHA and store it in Dgraph, also return the SHA with the response of the query.
    Example: curl -g 'http://localhost:8080/graphql/?query={sample_query}&extensions={"persistedQuery"}'

Queries and Hash-Keys once stored will persist until they are cleared/overwritten. This is similar to what Apollo Client/Server does here.

Unresolved questions

  • Security aspect aside, how much of an incremental performance benefit is there if we replace the query payload over network calls with a hashed string is not well understood.
    This will be tested using nginx server with dgraph
  • Should the server code sit in Dgraph or Slash? In other words, is this going to be an open-source feature or enterprise-only feature on Slash?
    It should sit in Dgraph since the feature will be opensource.

Cached results

Closely related to persistent query is the ability to cache responses at the browser/CDN layer.

This can be enabled if the response has appropriate headers in it as discussed in the comments below.

This section will be expanded soon after we enable persistent queries

References

Persistent Queries

These are supported by the top two GraphQL Clients - Apollo (1, 2) and Relay. We would support this at native graphql layer so users with free plans for the above clients can also leverage these functionalities.

Cached results

Few implementations of cached results are

  1. Hasura supports stored results in its cloud offering.
  2. MariaDB supports query response caching.
  3. ArangoDB supports cached query results accessible via REST apis.

OneGraph supports both persistent queries and stored results.

1 Like

I love this idea as long as one thing does not happen:

The memory foot-print is already quite large compared to the other referenced DBs

I was actually thinking how something like this might be possible during church Sunday night.

Context: Combatting the Complaining Character, based on medical research: How Complaining Rewires Your Brain for Negativity.

Your brain loves efficiency and doesn’t like to work any harder than it has to. When you repeat a behavior, such as complaining, your neurons branch out to each other to ease the flow of information. This makes it much easier to repeat that behavior in the future—so easy, in fact, that you might not even realize you’re doing it.

You can’t blame your brain. Who’d want to build a temporary bridge every time you need to cross a river? It makes a lot more sense to construct a permanent bridge. So, your neurons grow closer together, and the connections between them become more permanent. Scientists like to describe this process as, “Neurons that fire together, wire together.”

Repeated complaining rewires your brain to make future complaining more likely. Over time, you find it’s easier to be negative than to be positive, regardless of what’s happening around you. Complaining becomes your default behavior, which changes how people perceive you.

And here’s the kicker: complaining damages other areas of your brain as well. Research from Stanford University has shown that complaining shrinks the hippocampus—an area of the brain that’s critical to problem solving and intelligent thought. Damage to the hippocampus is scary, especially when you consider that it’s one of the primary brain areas destroyed by Alzheimer’s.

My friend preaching used the example of what we are going through with remapping our UI on top of the new DB and how that can be a long and tedius process but once all of the bridges are mapped then using those bridges becomes faster and faster.

I thought of how this could be applied at the deeper level as well with it put in this perspective. Apollo Client 3.0 does a really amazing job of utilizing previous queried data if the query and the variables are all the same it will not do a refetch but return the local data it already fetched. This works very well with pagination and going back to a previous page. It got me to thinking how lower level inside of DQL might be able to do this as well. I am doing

All of this to bring to question my main thoughts about this RFC:

  1. I am assuming that variables will be included in the hash?
  2. How will this effect auth rules? Can auth rules be cached?
  3. Will cached data be tied to JWTs? This sometimes may be wanted and other times unwanted depending on the use cases and auth rules that may or may not be applied.
  4. How will an update of the cached data be handled? It is quite common in UI dev to update apollo cache when doing update mutations because the cached data will not otherwise know that it was changed.
  5. Will updates be strictly tied to a timeout period? This would prove somewhat useless to catch any kind of data that may change somewhat frequently. (i.e.: caching a list of users and a new/updated user would not be added to the list until the timeout period.)

Persisted queries and response caching seem like two different topics. Not sure why they’re being mingled in the same RFC.

Response caching is a lot harder task in a distributed, MVCC, graph system, fwiw.

2 Likes

I agree. I would split this into two RFCs. CC: @pawan @gja @vvbalaji

@mrjn these two things are intertwined, though the caching doesn’t have to be in Dgraph itself. @Anurag so let’s say dgraph instead returns two headers:

GraphQL-CacheAge: 42 
GraphQL-CacheKey: <queryId>-<consistentHashofVariables>

Then it’s trivial to cache things at either Slash GraphQL level, or at a CDN that you put in front of Slash. Both AWS Lambda at the Edge and Cloudflare workers can easily handle this simple scenario, as long as it’s easy to replicate the consistentHashOfVariables in any programming language.

Also, if you wanted to convert this into a GET API, the way ArangoDB does it (they call it a REST API), I’ll be +100. Apollo has some sort of magic that makes this following snippet work:

createPersistedQueryLink({useGETForHashedQueries: true}

Rather than the blog post you’ve linked to, this is the more relevant article: https://www.apollographql.com/docs/apollo-server/performance/apq/

Note: This doesn’t allow for invalidation of results, which is why hasura only allows for a maximum cache of 5 minutes.

I edited the RFC, caching here refers to response caching at browser/CDN layer. We will not work on caching at database layer at the moment.

Yes, in the first cut we will include variables in the hash.

They will not be impacted since query will still be run and auth happens at a level below the cache which we wish to implement.

We might end-up not allowing persistent queries for auth enabled fields at the start. We will cross this bridge when we come to it.

Response caching is being dealt at browser/CDN level which doesn’t support updates. So currently it will be tied to a timeout period.

Let’s see how apollo handles this? My gut feeling is that variable declarations are included in the hash, not the variable values

  • I’d say make it compatible with Apollo Automatic Persisted Queries. It’s an easy enough spec to implement (at least from a high level reading).
  • I’d say you can explore this easily by putting a nginx in front of dgraph and testing it out.
  • I’ll leave the last one to the team

Questions to answer:

  • How would you delete a persisted query?
  • How would you get a list of persisted queries?
  • Also mention how to pass the TTL, or change the TTL.

Cache in Dgraph should be straightforward as well. A simple solution would be to store the result against the query hash, with a TTL in Alpha’s local Ristretto. No need to communicate that across Alphas either. If the query goes to another Alpha, it could run the query and then cache the results. Later we could think about how to invalidate the cache results on mutations.

2 Likes
  • You ideally shouldn’t ever need to delete the persistent query, but I think we planned some API for CRUD on persisted queries
  • The TTL is a directive on the query itself, so if you change the TTL, the query itself changes, changing the hash of the query, making it a new new persisted query :-).
  • We could do a simple cache in dgraph, but I feel like we could also do it in nginx / any other cache that’s in between. The advantage of doing it outside dgraph is that it can scale horizontally really trivially.
1 Like

Persisted queries are stored in dgraph, so the mutation to delete nodes can be used to delete these. We can provide an API (eg: extensions={"persistedQuery":DELETE_ALL} or document the predicate being used to store these. I think former is better - more secure because we don’t expose internal predicate to user.

1 Like

i think it’s ideal to have an admin api for crud operations on persistedQueries. That way, it’s authenticated as well

After offline discussion with @gja, we will not support c.

Separately, if we want dgraph to work out-of-the-box with Apollo’s client (with persistent queries enabled) then just completing the above RFC will not suffice. I am not sure if the requests via Apollo’s Client are equivalent to the curl requests described above. One difference I can easily point out is that the former is a POST whereas the latter is a GET.

I am looking into if supporting the client out-of-the-box requires more work.

The reason I am concerned is because:

How client works:
1. When the client makes a query, it will optimistically send a short (64-byte) cryptographic hash instead of the full query text.
2. If the backend recognizes the hash, it will retrieve the full text of the query and execute it.
3. If the backend doesn't recognize the hash, it will ask the client to send the hash and the query text to it can store them mapped together for future lookups. During this request, the backend will also fulfill the data request.

The client does it under-the-hood because it couples well with Apollo server’s error handling. For our case, it might just throw and not retry.

@gja @pawan @vvbalaji

At the end of the day, apollo server’s error handler is just an HTTP response. I’m pretty sure we can mimic it.

Apollo client’s persisted queries do work exactly as described. https://github.com/apollographql/apollo-link-persisted-queries has the exact protocol, and just says that you should return the following (I’m guessing this should be JSON):

{
  errors: [
    { message: 'PersistedQueryNotFound' }
  ]
}

What is the confusion around POST to GET? Both Dgraph and Apollo already support both POST and GET, there should be nothing new here.

Also calling out that the entire protocol is laid out in the link I shared, so they may send over either post or GET.

As we’d discussed sometime in the past, the best way to implement this would be some kind of middleware that wraps the graphql handler. You should be able to return the correct errors or whatnot from there.

Happy to get on another call.

1 Like

Related work:

  1. feat(Query): Enable persistent queries in dgraph
  2. feat(Query): Add cacheControl directive that adds headers in the response