Extend GraphQL layer on Dgraph to support persistent queries and cached results. When persistent queries are enabled then client only sends a hash of available queries to the server which has a list of known hashes and uses the related query. If cached results are enabled for a query then stored results are served if queried within time-to-live of the query being cached. Persistent Queries improve the performance and the security of application. Cached results improve the performance for read-heavy workloads.
Note: Caching in this RFC refers to caching at browser/CDN level. Caching at the database layer is out-of-scope of this RFC.
Motivation
Genesis of the discussion around supporting persisted queries was GraphQL in Space conference 2020. Facebook, Medium, etc. use persisted queries for performant and security reasons. We want to provide this functionality out of the box.
Persisted queries can be paired with GET requests enabling the caching. Cached results can be used to serve read-heavy workloads with complex queries to improve performance. With appropriate headers the caching can be done at browser/CDNs.
Guide-level explanation
Persistent Queries
We will support persistent queries for Dgraph in the following waterfall logic:
If extensions key is not provided in the request we process the request as usual
If extensions key is present in the GET request
a. If both query and SHA is provided we verify the hash and do a lookup/overwrite and process the query.
Example: curl -g 'http://localhost:8080/graphql/?query={sample_query}&extensions={"persistedQuery":{"sha256Hash":"hash-key-xxx"}}'
b. If only SHA is provided we do a look-up and process the query if found else return an error.
Example: curl -g 'http://localhost:8080/graphql/?extensions={"persistedQuery":{"sha256Hash":"hash-key-xxx"}}'
c. If only query is provided we generate a SHA and store it in Dgraph, also return the SHA with the response of the query.
Example: curl -g 'http://localhost:8080/graphql/?query={sample_query}&extensions={"persistedQuery"}'
Queries and Hash-Keys once stored will persist until they are cleared/overwritten. This is similar to what Apollo Client/Server does here.
Unresolved questions
Security aspect aside, how much of an incremental performance benefit is there if we replace the query payload over network calls with a hashed string is not well understood.
This will be tested using nginx server with dgraph
Should the server code sit in Dgraph or Slash? In other words, is this going to be an open-source feature or enterprise-only feature on Slash?
It should sit in Dgraph since the feature will be opensource.
Cached results
Closely related to persistent query is the ability to cache responses at the browser/CDN layer.
This can be enabled if the response has appropriate headers in it as discussed in the comments below.
This section will be expanded soon after we enable persistent queries
References
Persistent Queries
These are supported by the top two GraphQL Clients - Apollo (1, 2) and Relay. We would support this at native graphql layer so users with free plans for the above clients can also leverage these functionalities.
Your brain loves efficiency and doesn’t like to work any harder than it has to. When you repeat a behavior, such as complaining, your neurons branch out to each other to ease the flow of information. This makes it much easier to repeat that behavior in the future—so easy, in fact, that you might not even realize you’re doing it.
You can’t blame your brain. Who’d want to build a temporary bridge every time you need to cross a river? It makes a lot more sense to construct a permanent bridge. So, your neurons grow closer together, and the connections between them become more permanent. Scientists like to describe this process as, “Neurons that fire together, wire together.”
Repeated complaining rewires your brain to make future complaining more likely. Over time, you find it’s easier to be negative than to be positive, regardless of what’s happening around you. Complaining becomes your default behavior, which changes how people perceive you.
And here’s the kicker: complaining damages other areas of your brain as well. Research from Stanford University has shown that complaining shrinks the hippocampus—an area of the brain that’s critical to problem solving and intelligent thought. Damage to the hippocampus is scary, especially when you consider that it’s one of the primary brain areas destroyed by Alzheimer’s.
My friend preaching used the example of what we are going through with remapping our UI on top of the new DB and how that can be a long and tedius process but once all of the bridges are mapped then using those bridges becomes faster and faster.
I thought of how this could be applied at the deeper level as well with it put in this perspective. Apollo Client 3.0 does a really amazing job of utilizing previous queried data if the query and the variables are all the same it will not do a refetch but return the local data it already fetched. This works very well with pagination and going back to a previous page. It got me to thinking how lower level inside of DQL might be able to do this as well. I am doing
All of this to bring to question my main thoughts about this RFC:
I am assuming that variables will be included in the hash?
How will this effect auth rules? Can auth rules be cached?
Will cached data be tied to JWTs? This sometimes may be wanted and other times unwanted depending on the use cases and auth rules that may or may not be applied.
How will an update of the cached data be handled? It is quite common in UI dev to update apollo cache when doing update mutations because the cached data will not otherwise know that it was changed.
Will updates be strictly tied to a timeout period? This would prove somewhat useless to catch any kind of data that may change somewhat frequently. (i.e.: caching a list of users and a new/updated user would not be added to the list until the timeout period.)
@mrjn these two things are intertwined, though the caching doesn’t have to be in Dgraph itself. @Anurag so let’s say dgraph instead returns two headers:
Then it’s trivial to cache things at either Slash GraphQL level, or at a CDN that you put in front of Slash. Both AWS Lambda at the Edge and Cloudflare workers can easily handle this simple scenario, as long as it’s easy to replicate the consistentHashOfVariables in any programming language.
Also, if you wanted to convert this into a GET API, the way ArangoDB does it (they call it a REST API), I’ll be +100. Apollo has some sort of magic that makes this following snippet work:
Also mention how to pass the TTL, or change the TTL.
Cache in Dgraph should be straightforward as well. A simple solution would be to store the result against the query hash, with a TTL in Alpha’s local Ristretto. No need to communicate that across Alphas either. If the query goes to another Alpha, it could run the query and then cache the results. Later we could think about how to invalidate the cache results on mutations.
You ideally shouldn’t ever need to delete the persistent query, but I think we planned some API for CRUD on persisted queries
The TTL is a directive on the query itself, so if you change the TTL, the query itself changes, changing the hash of the query, making it a new new persisted query :-).
We could do a simple cache in dgraph, but I feel like we could also do it in nginx / any other cache that’s in between. The advantage of doing it outside dgraph is that it can scale horizontally really trivially.
Persisted queries are stored in dgraph, so the mutation to delete nodes can be used to delete these. We can provide an API (eg: extensions={"persistedQuery":DELETE_ALL} or document the predicate being used to store these. I think former is better - more secure because we don’t expose internal predicate to user.
After offline discussion with @gja, we will not support c.
Separately, if we want dgraph to work out-of-the-box with Apollo’s client (with persistent queries enabled) then just completing the above RFC will not suffice. I am not sure if the requests via Apollo’s Client are equivalent to the curl requests described above. One difference I can easily point out is that the former is a POST whereas the latter is a GET.
I am looking into if supporting the client out-of-the-box requires more work.
How client works:
1. When the client makes a query, it will optimistically send a short (64-byte) cryptographic hash instead of the full query text.
2. If the backend recognizes the hash, it will retrieve the full text of the query and execute it.
3. If the backend doesn't recognize the hash, it will ask the client to send the hash and the query text to it can store them mapped together for future lookups. During this request, the backend will also fulfill the data request.
The client does it under-the-hood because it couples well with Apollo server’s error handling. For our case, it might just throw and not retry.
What is the confusion around POST to GET? Both Dgraph and Apollo already support both POST and GET, there should be nothing new here.
Also calling out that the entire protocol is laid out in the link I shared, so they may send over either post or GET.
As we’d discussed sometime in the past, the best way to implement this would be some kind of middleware that wraps the graphql handler. You should be able to return the correct errors or whatnot from there.