RFC: Branch concept for Schema

Draft ignore typo - Request for comments

Summary

The idea is to create a logical branching mechanism in the schema. That way we can bring new functionality like pseudo-drop-all feature to Shared Cluster backend and also allow easy schema migration. Without losing tracing and downtime. And introduce aliases.

This would directly affect the query system(runtime) and as it interacts with the schema, the syntax should probably be modified to introduce branch context, and add new operations to Alter.

Motivation

A branching system in the schema would make it easier for the end user. The main idea is to have a way to migrate the schema, and continue tracking it. For example, this could be considered a type of schema versioning. As the team is modifying the schema, instead of changing the entire DB executing tasks that could leave the cluster absent. DB admins could simply create new branches where they can rename a predicate and add new predicates without messing with the old schema. The new predicates would not be “merged” or accessible to Main. For they would belong to the created Branch.

More details.

Pseudo-drop-all: This functionality would not delete the data. But ignore them completely. They would be detached from the main schema and also all index bases removed (respecting the context. e.g. do not delete indexing of “name” that belongs to another tenant). These predicates and data would be dangling around (can be exported in RDF) and taking up space. But it would allow the user to have a drop all simulation momentarily.

The behavior would be simple. the user performs a drop all in that context and then the DB will appear to be empty. However, all predicates will be throw into “limbo”. But they will still be accessible via a query specifying limbo as context. In case of a shared instance, everything goes to Limbo.

Branching: New branches would create a reference copy (which is not a copy, just like git does). Like a “pointer” in memory. The “name” predicate could be renamed to “user.name” without losing the initial reference. Atomically the predicate would not be modified to “user.name”, but appear to have been modified and only then in the respective branch to be queried. So we can rename predicates like in this example

Example 1: With that the user would not need to modify the dataset manually. e.g migrate data to Dgraph’s GraphQL in an easy way. Just with alias.

Example 2: It would allow the use of Alias ​​predicate. In fact, everything would be an alias. Except the name initially given to the predicate.

Example 3: This will allow schema rollback without needing bulk upsert and etc.

Example 4: This would also allow you to run two different schemas (on different branches) pointing to the same database. The user could create things like custom multitenacy. But without ACL. And also no commercial advantage, just local. Also this could act like “multiple graphs” or “multiple DBs” in the same DB.

Execution example. There would be 3 ways to run queries. A global one, where all predicates of all branches would be accessible (ie, everything would be eliases). One query for branch(contextual) and one for querying dangling data. That is, those predicates that were hidden via Pseudo-drop-all.