With the advent of GraphQL, we now have two schemas in Dgraph:
- Dgraph schema: Consists of types and predicates, and stored in badger using type and schema keys.
- GraphQL schema: It is a string, and stored in badger using data keys.
Both Dgraph and GraphQL schemas are also kept in-memory on every alpha node.
At present, we represent GraphQL schema in Dgraph using following Dgraph schema:
type dgraph.graphql {
dgraph.graphql.xid
dgraph.graphql.schema
}
dgraph.graphql.xid: string @index(exact) @upsert .
dgraph.graphql.schema: string .
When any alpha node boots up, it first tries to upsert an empty GraphQL schema, gets the id of the dgraph.graphql
node from the upsert procedure, keeps the id in-memory and uses that id to mutate GraphQL schema node in Dgraph when someone performs an updateGQLSchema
mutation. Doing the upsert using same xid value everytime makes sure that there is only one node of type dgraph.graphql
ever in Dgraph. Also, doing it during the boot-up process ensures that we only have to think about the update case during updateGQLSchema
and not the add case.
Also, every alpha node has a separate goroutine in which it subscribes to changes on dgraph.graphql.schema
predicate, so that whenever someone performes the updateGQLSchema
mutation, all the alphas can update their in-memory copy of GraphQL schema from the update in Dgraph.
But, then we have DROP_ALL and DROP_DATA operations in Dgraph. Lets first consider DROP_ALL.
DROP_ALL
If someone does a DROP_ALL, then everything is gone. Data, predicates, types, everything! The initial internal types and predicates are re-created after DROP_ALL, but the initial upsert for GraphQL schema node which was done by the GraphQL layer on every alpha on boot-up, is not done after a DROP_ALL. The alphas still have the old id in-memory for GraphQL schema node, as a result of this updateGQLSchema
mutation stops working, because the dgraph.graphql
node no more exists in Dgraph!
This problem could be solved by any of the following approaches:
- Having a publisher-subscriber mechanism for DROP_ALL. So, whenever a DROP_ALL happens, a notification is sent to all subscribers. GraphQL layer sets up a subscription on DROP_ALL notifications, and upserts the empty GQLSchema node again in Dgraph. This will update the
dgraph.graphql.schema
predicate, and the in-memory schema will get updated as a result of the other existing subscription on this predicate. So, everything starts working back again. - Doing the upsert for GraphQL schema node only on Group-1 leader instead of on every alpha, for both the initial boot-up process and after DROP_ALL.
- Completely scrapping the initial upsert during alpha boot up for GraphQL schema node in Dgraph, and handling the upsert during
updateGQLSchema
mutation.
So, the case of DROP_ALL can be solved. But, now lets consider DROP_DATA.
DROP_DATA
DROP_DATA is similar to DROP_ALL, except that it doesn’t remove types and predicates, i.e., it is supposed to remove only the data and not the schema. But, as the GraphQL schema is a schema and is stored in a data key we lose it after DROP_DATA. So, we need to find a way to be able to retain the GraphQL schema string even after DROP_DATA. This is where we don’t have a perfect solution as we want to be able to retain the GraphQL schema in all possible cases. What we have considered so far for this is following:
- Just make a copy of the GraphQL schema in-memory on every alpha before DROP_DATA happens, then after it finishes, upsert the GraphQL schema from memory using all alphas. This is prone to failure in the rare event of all the alphas crashing together, as we will lose the in-memory copy of schema.
- On all nodes in Group-1, make a copy of the schema in a file on disk before DROP_DATA happens, then after it finishes, upsert the GraphQL schema from the file on disk. This has the challenge of ensuring that the file gets replicated to those Group-1 alpha nodes which join after the file was written. So, if the leader crashed, then still we have the file on other alphas, and then they can use it to do the upsert when one of them becomes leader. (Only the leader inserts the GraphQL schema node in this approach as in one of the approaches in DROP_ALL).
- Store the GraphQL schema in Zero, and not in Alpha. So, it never gets deleted on DROP_DATA. But, then need to figure out a way to delete it on DROP_ALL, update it when user wants to update, notify all alphas when it gets updated, and, initial fetch during alpha boot-up.
- Store the GraphQL schema in
SchemaUpdate.ObjectTypeName
attribute of the schema key fordgraph.graphql.schema
as a workaround. This attribute is only used for Object Posting_ValType and asdgraph.graphql.schema
is a string, it will be empty for this. (Also, we didn’t find any use of Object Posting_ValType. So, maybe this attribute and the corresponding Posting_ValType are stale ?) - Have a way of excluding a particular key while doing
DropPrefix()
in badger. Basically, a way to not deletedgraph.graphql.schema
from badger while doing DROP_DATA. - Have a new type of key like schema or type keys, to store data like GraphQL schema in badger, so it doesn’t get deleted while dropping all keys with default prefix.
But another thing to think is that should we put so much focus on retaining the GraphQL schema in case of DROP_DATA? As, unlike Dgraph schema, which may be constructed over time (each time adding new types/predicates will add them to existing schema), the GraphQL schema is mostly supplied as a whole. So, users will have the copy of the schema file with them anyways, which can be reapplied if DROP_DATA happens.
Need inputs here regarding should the DROP_DATA case be handled at all? If yes, how to handle the DROP_DATA case perfectly without losing the GraphQL schema string? The solution should also be in coherence with the DROP_ALL case, if possible.
cc: @pawan @michaelcompton @mrjn @ibrahim @ashishgoswami
Update [5 June 2020]
For solving both DROP_ALL and DROP_DATA cases, we have decided to route the whole updateGQLSchema
mutation through Group-1 leader. Now, the schema update is no more an upsert using xid, it is just going to be a simple upsert based on whether the node exists already or not, as only the Group-1 leader is going to perform this mutation, so no race conditions there. So, we won’t need any special handling after DROP_ALL, because no initial upsert for GraphQL schema node is required now. Also, this solves another race condition which currently exists with updateGQLSchema
mutation, where if two concurrent updateGQLSchema
mutations were performed, then the GraphQL schema may get set different than the Dgraph schema. The only place we need special handling with this is after DROP_DATA, where we need to take a backup of the schema in memory before doing the drop, and then re-insert the GraphQL schema after the drop operation is complete by reusing the method for updating GraphQL schema on Group-1.
There are following two points of failure with this approach:
-
Group-1 leader crashed during
updateGQLSchema
: In this case, there are chances that the GraphQL schema node gets updated but the Dgraph schema is not, which may lead to difference between the two. But, this can be easily solved by just calling theupdateGQLSchema
mutation again. And, anyways the alpha which received the update request will either report a timeout or an error to the user as it won’t hear from the Group-1 leader. So, users will know that the schema is not update successfully. - If the Group-1 leader crashed just after doing DROP_DATA: in that case we will lose the GraphQL schema. But, given that drop operations are very rare, and crash is also a rare case, and even if it happened, users will mostly have the GraphQL schema file with them, which they can just re-apply.