RFC: Proposal for change in Type System

Proposal.

Change Type System from Index based for Node tree based. As an “ontology”. This would allow having a linked T.System. For example. User → Product → purchase.

Motivation

The Type System today uses the configuration bellow

		&pb.SchemaUpdate{
			Predicate: "dgraph.type",
			ValueType: pb.Posting_STRING,
			Directive: pb.SchemaUpdate_INDEX,
			Tokenizer: []string{"exact"},
			List:      true,
		}

It is a string of type list.

And as time goes on, it gets worse. Because absolutely all nodes of any Type will have the dgraph.type with its respective value. That is, if you have billions of objects, you will have a gigantic dgraph.type index over time. And it is not possible to break dgraph.type into smaller shards. In other words, a recipe for issues.

For example. This schema below would be possible and in fact be linked (Similar to GraphQL).

type <User> {
	Product: <Product>.
	name .
}

type <Product> {
	name .
	price .
	Purchase: <Purchase> .
}

type <Purchase> {
	name .
	Products: <Product> .
}

The Node based concept (as ontology-like aspect) would be much better than what we have today. Bulk indexing creates a larger demand as it is centralized in a single predicate.

Nodes are just pointers to be traversed. Dgraph is much faster at traversing nodes than doing long searches in the index table.

Some ideas

All types would be nodes that would have 0x0 as a parent. So we can “recurse” the schema from the top from a known address.

Examples

<0x0> <dgraph.root> <_:UserType> .
<0x0> <dgraph.root> <_:ProductType> .
<0x0> <dgraph.root> <_:Purchase> .


<_:UserType> <dgraph.child.Product> <_:ProductType> .
<_:UserType> <dgraph.predicates> "name" .

<_:ProductType> <dgraph.child.Purchase> <_:PurchaseType> .
<_:ProductType> <dgraph.predicates> "name" .
<_:ProductType> <dgraph.predicates> "price" .

<_:PurchaseType> <dgraph.child.Products> <_:ProductType> .
<_:PurchaseType> <dgraph.predicates> "name" .
<_:PurchaseType> <dgraph.predicates> "price" .

Problems

In order to import(bulk load or liveload) the data from previous versions. You would need to create an upsert function to modify the <dgraph.type> values that users have for Nodes. Upsert would ensure there were no duplicates. Or we can simply have users manually make the change via Bulk Upsert.

<0x2001> <dgraph.type> "FamilyMember" .

e.g pseudo-code: if <dgraph.type> = ( type list ) || then Upsert it

BTW, we could also use the “upgrade” tool to do this.

Collateral Benefit

  1. When querying the schema in Ratel. You will be able to see how your schema is linked. It can facilitate the experience of analyzing a Schema and also planning a Schema Modeling.
  2. This would facilitate the creation of the “Schema aliases” feature. See Add aliases at the schema level (In type)

cc. @Raphael

3 Likes

We have finally arrived at the se conclusion, lol

See:

Here is the main gist of my thoughts:

2 Likes

I was already thinking about this before these problems appeared. I even argued with Martin(Old core dev that worked in the Type System) about it. And he said that it wasn’t necessary and too complex. I trusted him because he was a great engineer. And he is. But nobody is infallible.

It is a viable change and one that will certainly improve a lot.

2 Likes

This would be an excellent change that could help us in many scenarios! As well as the performance benefits, it bring the graph closer to a traditional knowledge graph whith a proper heirarchy of concepts. Please keep in mind that the heirarchy may often refer to the same node, e.g. CORP A (Organisation) may have a subType (Tech Company) for the same uid, as well as a “relational type” of Employee for a different uid of type Individual. Ontologies are basically graphs in themselves, which is why I fundamentally like this change (everything is a graph!). We have built up quite a large set of ontologies for different industries (healthcare, finance, news media etc) so im happy to provide you with different schemas we have to sense check them against this design or test an alpha. Also always good to think about the interoperability with the GraphQL side.

A standard like OWL would probably be overkill for Dgraph, but i think it would be fantastic to sense check this proposal against OWL examples to check it covers the main bases Examples for OWL

3 Likes