Add aliases at the schema level (In type)

Moved from GitHub dgraph/4898

Posted by MichelDiz:

Experience Report

reference: Support JSON-LD in Dgraph

What you wanted to do

Add aliases at the schema level. To preserve my dataset.

Why that wasn’t great, with examples

It is not possible.

e.g
Take this RDF for example.

_:b0 <http://www.w3.org/2002/12/cal/ical#location> "New Orleans Arena, New Orleans, Louisiana, USA" .

It is a very common type of RDF dataset source.

But it is hard to daily work or to use in APIs (like GraphQL). And every time you need to make a query, you need to manually write this whole line with an alias.

So, my suggestion would have an alias at the schema level like this:

type Example {
   location : <http://www.w3.org/2002/12/cal/ical#location>
}

So, instead of querying for <http://www.w3.org/2002/12/cal/ical#location> they would use location instead. And the predicate would be preserved in storage level.

It is good to preserve this information, in case the user needs to use the RDF in an application that uses web semantics from W3C standards and other information that this type of file (RDF) usually has.

It also prevents the user from having to sanitize the whole RDF, thereby completely changing its semantic information structure.

Even though we don’t natively support the W3C’s RDF standards. Still, if the user exports as JSON (Simulating JSON-LD). He can convert to W3C RDF easily using third-party applications.

Schema-level aliases can be useful if users decide to design their own sharding algorithm. Even better if we support a kind of “wildcards aliases”.

type User {
   name : <name.dgraph.*?number[0-300]> #This would create several (301) aliases 
   friend : <friend.dgraph.*?number[0-300]>
}

The wildcard would avoid add manually something like

type User {
   name : <name.dgraph.000>; <name.dgraph.001>; <name.dgraph.003>; <name.dgraph.004>
   (...)
}

This above, is basically a “sharding” strategy with what we already have. No deep core code needs to be changed.

Once we have this. We need to add in the query execution. A way to loop over those generated predicates concurrently. At root it is easy, you just create on the fly several blocks and shoot them against the network. But nested ones are tricky.

We could also add a rule for tablets. If wildcards aliases is being used. No predicate shard should be at the same group or if not enough groups to accommodate them. Let them be even across groups.

Linked to