Storing GraphQL schema in Dgraph

With the advent of GraphQL, we now have two schemas in Dgraph:

  1. Dgraph schema: Consists of types and predicates, and stored in badger using type and schema keys.
  2. GraphQL schema: It is a string, and stored in badger using data keys.

Both Dgraph and GraphQL schemas are also kept in-memory on every alpha node.

At present, we represent GraphQL schema in Dgraph using following Dgraph schema:

type dgraph.graphql {
  dgraph.graphql.xid
  dgraph.graphql.schema
}
dgraph.graphql.xid: string @index(exact) @upsert .
dgraph.graphql.schema: string .

When any alpha node boots up, it first tries to upsert an empty GraphQL schema, gets the id of the dgraph.graphql node from the upsert procedure, keeps the id in-memory and uses that id to mutate GraphQL schema node in Dgraph when someone performs an updateGQLSchema mutation. Doing the upsert using same xid value everytime makes sure that there is only one node of type dgraph.graphql ever in Dgraph. Also, doing it during the boot-up process ensures that we only have to think about the update case during updateGQLSchema and not the add case.

Also, every alpha node has a separate goroutine in which it subscribes to changes on dgraph.graphql.schema predicate, so that whenever someone performes the updateGQLSchema mutation, all the alphas can update their in-memory copy of GraphQL schema from the update in Dgraph.

But, then we have DROP_ALL and DROP_DATA operations in Dgraph. Lets first consider DROP_ALL.

DROP_ALL
If someone does a DROP_ALL, then everything is gone. Data, predicates, types, everything! The initial internal types and predicates are re-created after DROP_ALL, but the initial upsert for GraphQL schema node which was done by the GraphQL layer on every alpha on boot-up, is not done after a DROP_ALL. The alphas still have the old id in-memory for GraphQL schema node, as a result of this updateGQLSchema mutation stops working, because the dgraph.graphql node no more exists in Dgraph!

This problem could be solved by any of the following approaches:

  • Having a publisher-subscriber mechanism for DROP_ALL. So, whenever a DROP_ALL happens, a notification is sent to all subscribers. GraphQL layer sets up a subscription on DROP_ALL notifications, and upserts the empty GQLSchema node again in Dgraph. This will update the dgraph.graphql.schema predicate, and the in-memory schema will get updated as a result of the other existing subscription on this predicate. So, everything starts working back again.
  • Doing the upsert for GraphQL schema node only on Group-1 leader instead of on every alpha, for both the initial boot-up process and after DROP_ALL.
  • Completely scrapping the initial upsert during alpha boot up for GraphQL schema node in Dgraph, and handling the upsert during updateGQLSchema mutation.

So, the case of DROP_ALL can be solved. But, now lets consider DROP_DATA.

DROP_DATA
DROP_DATA is similar to DROP_ALL, except that it doesn’t remove types and predicates, i.e., it is supposed to remove only the data and not the schema. But, as the GraphQL schema is a schema and is stored in a data key we lose it after DROP_DATA. So, we need to find a way to be able to retain the GraphQL schema string even after DROP_DATA. This is where we don’t have a perfect solution as we want to be able to retain the GraphQL schema in all possible cases. What we have considered so far for this is following:

  1. Just make a copy of the GraphQL schema in-memory on every alpha before DROP_DATA happens, then after it finishes, upsert the GraphQL schema from memory using all alphas. This is prone to failure in the rare event of all the alphas crashing together, as we will lose the in-memory copy of schema.
  2. On all nodes in Group-1, make a copy of the schema in a file on disk before DROP_DATA happens, then after it finishes, upsert the GraphQL schema from the file on disk. This has the challenge of ensuring that the file gets replicated to those Group-1 alpha nodes which join after the file was written. So, if the leader crashed, then still we have the file on other alphas, and then they can use it to do the upsert when one of them becomes leader. (Only the leader inserts the GraphQL schema node in this approach as in one of the approaches in DROP_ALL).
  3. Store the GraphQL schema in Zero, and not in Alpha. So, it never gets deleted on DROP_DATA. But, then need to figure out a way to delete it on DROP_ALL, update it when user wants to update, notify all alphas when it gets updated, and, initial fetch during alpha boot-up.
  4. Store the GraphQL schema in SchemaUpdate.ObjectTypeName attribute of the schema key for dgraph.graphql.schema as a workaround. This attribute is only used for Object Posting_ValType and as dgraph.graphql.schema is a string, it will be empty for this. (Also, we didn’t find any use of Object Posting_ValType. So, maybe this attribute and the corresponding Posting_ValType are stale ?)
  5. Have a way of excluding a particular key while doing DropPrefix() in badger. Basically, a way to not delete dgraph.graphql.schema from badger while doing DROP_DATA.
  6. Have a new type of key like schema or type keys, to store data like GraphQL schema in badger, so it doesn’t get deleted while dropping all keys with default prefix.

But another thing to think is that should we put so much focus on retaining the GraphQL schema in case of DROP_DATA? As, unlike Dgraph schema, which may be constructed over time (each time adding new types/predicates will add them to existing schema), the GraphQL schema is mostly supplied as a whole. So, users will have the copy of the schema file with them anyways, which can be reapplied if DROP_DATA happens.

Need inputs here regarding should the DROP_DATA case be handled at all? If yes, how to handle the DROP_DATA case perfectly without losing the GraphQL schema string? The solution should also be in coherence with the DROP_ALL case, if possible.

cc: @pawan @michaelcompton @mrjn @ibrahim @ashishgoswami

Update [5 June 2020]
For solving both DROP_ALL and DROP_DATA cases, we have decided to route the whole updateGQLSchema mutation through Group-1 leader. Now, the schema update is no more an upsert using xid, it is just going to be a simple upsert based on whether the node exists already or not, as only the Group-1 leader is going to perform this mutation, so no race conditions there. So, we won’t need any special handling after DROP_ALL, because no initial upsert for GraphQL schema node is required now. Also, this solves another race condition which currently exists with updateGQLSchema mutation, where if two concurrent updateGQLSchema mutations were performed, then the GraphQL schema may get set different than the Dgraph schema. The only place we need special handling with this is after DROP_DATA, where we need to take a backup of the schema in memory before doing the drop, and then re-insert the GraphQL schema after the drop operation is complete by reusing the method for updating GraphQL schema on Group-1.

There are following two points of failure with this approach:

  1. Group-1 leader crashed during updateGQLSchema: In this case, there are chances that the GraphQL schema node gets updated but the Dgraph schema is not, which may lead to difference between the two. But, this can be easily solved by just calling the updateGQLSchema mutation again. And, anyways the alpha which received the update request will either report a timeout or an error to the user as it won’t hear from the Group-1 leader. So, users will know that the schema is not update successfully.
  2. If the Group-1 leader crashed just after doing DROP_DATA: in that case we will lose the GraphQL schema. But, given that drop operations are very rare, and crash is also a rare case, and even if it happened, users will mostly have the GraphQL schema file with them, which they can just re-apply.
1 Like

DROP_ALL:

  • Handle on Group-1 leader
  • treat dgraph.graphql.schema specially
  • don’t remove the on schema drop all, just set it to “”
  • that should fire the subscription in all the alphas and they handle setting to an empty schema

DROP_DATA:
…your option 5 ??? It would be great if DROP_DATA kept the same GraphQL schema - it’d be annoying for users to have to add it again.

The other thing to throw in here is the /admin endpoint. If I’m on Slash GraphQL, I only have access to /admin, so how do I do a drop all - say for my test DB? So we’ll need some sort of endpoint to do this, which I think should have the same behaviour as the two above.

Why store the graphql schema in data keys? Why not store them as a schema key, so it won’t be deleted during DROP_DATA?

Irrespective of how it gets deleted, you could just have a simple goroutine on group 1 leader (best way to do this is to run it on all Alphas which are group 1 in a loop, check if they’re the leader and then check the key and take action), which is checking if the graphql schema is there or not. If it is not there, it can do whatever you like, like construct it and stuff.

Also ensure that subscription mechanism would just fine with the key being present, then being removed, and then being regenerated.

The schema key for dgraph.graphql.schema will contain the schema for this predicate, which is:

dgraph.graphql.schema: string .

I was thinking we can only store SchemaUpdate values in a schema key, and not a custom string value.
Is it possible to store custom value for schema keys?

Actually, technically don’t all these requests go through admin which is a GraphQL endpoint? If so, you could do some pre and post-processing to avoid this situation to begin with.

When you get a DROP_ALL, just reset the schema at the end. When you get DROP_DATA, you can read the schema first, do the data drop, and then write the schema back.

At the moment, GraphQL doesn’t have these things. Also, even if we introduce them in GraphQL, there would still be a way to do DROP_ALL or DROP_DATA in Dgraph’s HTTP API.

This is where the problem was. What if the alpha crashed before it could write back the in-memory schema?

Ok, in that case, whatever is running that code can do those things.

Sounds like an extreme case. I won’t worry too much. Drops are not supposed to be transactional or anything.

If you really want to solve it, you could write the schema to a file before doing the drop – but is it really that big of a problem that we need to solve it? I’d do the simple thing first, and then add logic later if needed.

We had considered this option too.

We were just trying to figure out a perfect solution, if possible. But, I think it is not that big of a problem. First, it is rare to occur. Then, even if we lose the schema during DROP_DATA, mostly users should be having the schema file with them to reapply the schema.

1 Like

Then, I guess this option seems better at the moment. All alphas crashing together is a very rare event.

The less the number of alphas, the more is the risk of losing the schema. So, a cluster with only one alpha will have the highest risk of losing it. And the risk is the crash of alpha. So, I guess the probability should be very less for such an event.

1 Like