RFC: Branch concept for Schema

MichelDiz · September 30, 2022, 10:11pm

Draft ignore typo - Request for comments

Summary

The idea is to create a logical branching mechanism in the schema. That way we can bring new functionality like pseudo-drop-all feature to Shared Cluster backend and also allow easy schema migration. Without losing tracing and downtime. And introduce aliases.

This would directly affect the query system(runtime) and as it interacts with the schema, the syntax should probably be modified to introduce branch context, and add new operations to Alter.

Motivation

A branching system in the schema would make it easier for the end user. The main idea is to have a way to migrate the schema, and continue tracking it. For example, this could be considered a type of schema versioning. As the team is modifying the schema, instead of changing the entire DB executing tasks that could leave the cluster absent. DB admins could simply create new branches where they can rename a predicate and add new predicates without messing with the old schema. The new predicates would not be “merged” or accessible to Main. For they would belong to the created Branch.

More details.

Pseudo-drop-all: This functionality would not delete the data. But ignore them completely. They would be detached from the main schema and also all index bases removed (respecting the context. e.g. do not delete indexing of “name” that belongs to another tenant). These predicates and data would be dangling around (can be exported in RDF) and taking up space. But it would allow the user to have a drop all simulation momentarily.

The behavior would be simple. the user performs a drop all in that context and then the DB will appear to be empty. However, all predicates will be throw into “limbo”. But they will still be accessible via a query specifying limbo as context. In case of a shared instance, everything goes to Limbo.

Branching: New branches would create a reference copy (which is not a copy, just like git does). Like a “pointer” in memory. The “name” predicate could be renamed to “user.name” without losing the initial reference. Atomically the predicate would not be modified to “user.name”, but appear to have been modified and only then in the respective branch to be queried. So we can rename predicates like in this example

github.com/dgraph-io/dgraph

Add aliases at the schema level (In type)

opened 03:37AM - 08 Mar 20 UTC

MichelDiz

kind/feature area/schema dgraph

## Experience Report reference: https://discuss.dgraph.io/t/support-json-ld-o…n-dgraph/7162 ### What you wanted to do Add aliases at the schema level. To preserve my dataset. ### Why that wasn't great, with examples It is not possible. e.g Take this RDF for example. ``` _:b0 <http://www.w3.org/2002/12/cal/ical#location> "New Orleans Arena, New Orleans, Louisiana, USA" . ``` It is a very common type of RDF dataset source. But it is hard to daily work or to use in APIs (like GraphQL). And every time you need to make a query, you need to manually write this whole line with an alias. So, my suggestion would have an alias at the schema level like this: ``` type Example { location : <http://www.w3.org/2002/12/cal/ical#location> } ``` So, instead of querying for `<http://www.w3.org/2002/12/cal/ical#location>` they would use `location` instead. And the predicate would be preserved in storage level. It is good to preserve this information, in case the user needs to use the RDF in an application that uses web semantics from W3C standards and other information that this type of file (RDF) usually has. It also prevents the user from having to sanitize the whole RDF, thereby completely changing its semantic information structure. >Even though we don't natively support the W3C's RDF standards. Still, if the user exports as JSON (Simulating JSON-LD). He can convert to W3C RDF easily using third-party applications.

Example 1: With that the user would not need to modify the dataset manually. e.g migrate data to Dgraph’s GraphQL in an easy way. Just with alias.

Example 2: It would allow the use of Alias predicate. In fact, everything would be an alias. Except the name initially given to the predicate.

Example 3: This will allow schema rollback without needing bulk upsert and etc.

Example 4: This would also allow you to run two different schemas (on different branches) pointing to the same database. The user could create things like custom multitenacy. But without ACL. And also no commercial advantage, just local. Also this could act like “multiple graphs” or “multiple DBs” in the same DB.

Execution example. There would be 3 ways to run queries. A global one, where all predicates of all branches would be accessible (ie, everything would be eliases). One query for branch(contextual) and one for querying dangling data. That is, those predicates that were hidden via Pseudo-drop-all.

Topic		Replies	Views
Drop and redefine predicate in schema Dgraph kind:question , dgraph	4	1683	December 17, 2020
Drop all partially works for schema Users	8	585	November 4, 2018
Should DROP_ALL remove predicates from tablets Dgraph status:accepted , kind:bug , ticket:created	4	497	August 19, 2020
How to correctly alter a schema in DQL Dgraph kind:question	1	578	February 10, 2021
[RFC] Naming of reserved predicate for type system Dgraph	15	957	March 26, 2019

RFC: Branch concept for Schema

Summary

Motivation

More details.

Related topics