Proposal.
Change Type System from Index based for Node tree based. As an “ontology”. This would allow having a linked T.System. For example. User → Product → purchase.
Motivation
The Type System today uses the configuration bellow
&pb.SchemaUpdate{
Predicate: "dgraph.type",
ValueType: pb.Posting_STRING,
Directive: pb.SchemaUpdate_INDEX,
Tokenizer: []string{"exact"},
List: true,
}
It is a string of type list.
And as time goes on, it gets worse. Because absolutely all nodes of any Type will have the dgraph.type
with its respective value. That is, if you have billions of objects, you will have a gigantic dgraph.type
index over time. And it is not possible to break dgraph.type into smaller shards. In other words, a recipe for issues.
For example. This schema below would be possible and in fact be linked (Similar to GraphQL).
type <User> {
Product: <Product>.
name .
}
type <Product> {
name .
price .
Purchase: <Purchase> .
}
type <Purchase> {
name .
Products: <Product> .
}
The Node based concept (as ontology-like aspect) would be much better than what we have today. Bulk indexing creates a larger demand as it is centralized in a single predicate.
Nodes are just pointers to be traversed. Dgraph is much faster at traversing nodes than doing long searches in the index table.
Some ideas
All types would be nodes that would have 0x0 as a parent. So we can “recurse” the schema from the top from a known address.
Examples
<0x0> <dgraph.root> <_:UserType> .
<0x0> <dgraph.root> <_:ProductType> .
<0x0> <dgraph.root> <_:Purchase> .
<_:UserType> <dgraph.child.Product> <_:ProductType> .
<_:UserType> <dgraph.predicates> "name" .
<_:ProductType> <dgraph.child.Purchase> <_:PurchaseType> .
<_:ProductType> <dgraph.predicates> "name" .
<_:ProductType> <dgraph.predicates> "price" .
<_:PurchaseType> <dgraph.child.Products> <_:ProductType> .
<_:PurchaseType> <dgraph.predicates> "name" .
<_:PurchaseType> <dgraph.predicates> "price" .
Problems
In order to import(bulk load or liveload) the data from previous versions. You would need to create an upsert function to modify the <dgraph.type>
values that users have for Nodes. Upsert would ensure there were no duplicates. Or we can simply have users manually make the change via Bulk Upsert.
<0x2001> <dgraph.type> "FamilyMember" .
e.g pseudo-code: if <dgraph.type> = ( type list ) || then Upsert it
BTW, we could also use the “upgrade” tool to do this.
Collateral Benefit
- When querying the schema in Ratel. You will be able to see how your schema is linked. It can facilitate the experience of analyzing a Schema and also planning a Schema Modeling.
- This would facilitate the creation of the “Schema aliases” feature. See Add aliases at the schema level (In type)
cc. @Raphael