Increase RDF compliance support (N-Triples)

MichelDiz · July 28, 2020, 6:57pm

Reason for this RFC

Providing support closer to native RDF opens up the possibility for other users to bring their datasets to Dgraph, and facilitating their move out as well. This approach is friendly.

It also opens up the possibility for users with RDF datasets to take advantage of Dgraph’s GraphQL out-of-the-box. With zero code and effort. They just need to know well their dataset model.

Note: see that the W3C team wishes to use GraphQL GitHub - w3c/graphql-rdf: Bridging GraphQL and RDF Community Group

It is worth mentioning that Dgraph is not able to support specific normative of the RDF standard. Things like namespacing, facts, XML serialization syntax, Turtle syntax, and other structures are beyond what Dgraph can do today. Perhaps in the future, some of these structures can be useful.

Quick topics related to this RFC

Dgraph now supports storing XIDs on the node. We should support also exporting the RDF preserving the previous blank node. Based on the XIDs stored.
Deal better with Unrecognized RDF types
Add aliases at the schema level (In type)
This one is related to JSON-LD, but is potentially an issue for RDF too Support JSON-LD in Dgraph
And Support reserved character "@" to be used in predicate naming
This validator need to be RDF compliant Add a validator for bulk and live loader

Source

Issues with Blank Nodes.

In nquads tests. The blank node format <http://example/s> (this is a URI blank node) won’t work. It is necessary to change it manually (and add a Blank node label) to:
_:http:example:s
or
<_:http://example:s>

In short, the RDF standard accepts blank nodes within angular brackets. However, Dgraph accepts only uid64 or if it has the label _: - this can be a problem if we use the flag --store_xids because Dgraph will store with the prefix that is not supported in the same way in the RDF standard.

https://github.com/w3c/rdf-tests/blob/gh-pages/nquads/comment_following_triple.nq

One error in the parser related to URIs

{
  set {
    <_:http://a.example/s> <http://example/test> <scheme:!$%25&'()*+,-./0123456789:/@ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz~?#> .

  }
}

“message”: “strconv.ParseUint: parsing "(…).(…)(…)”: invalid syntax"

The parser thinks it is a literal UID. Not a blank node.

https://github.com/w3c/rdf-tests/blob/gh-pages/nquads/nt-syntax-uri-04.nq

Issues with Language support

No issues found with RDF itself. But if the user don’t set the Schema before inserting the data, the language support can be an issue.

Language tags can only be used with predicates of string type having 
@lang directive in schema. Got: [http://example.org/ex#b]"

So, Dgraph should infer the lang tag and add it to the schema as it does for some other cases like lists.

{
      q(func: has(<http://example.org/ex#b>@en-UK)) {
        uid
       <http://example.org/ex#b>@en-UK
      }
}

{
"data": {
    "q": [
      {
        "uid": "0x2717",
        "http://example.org/ex#b@en-UK": "Cheers"
      }
    ]
  }
}

NOTE: Also, I’m not sure if sub language tags are mentioned on docs. And also about how to query lan tags with angular brackets.

Parser issues

This test bellow doesn’t return the correct value in the string test.
https://github.com/w3c/rdf-tests/blob/gh-pages/nquads/literal_all_controls.nq

Ascii_boundaries

This test bellow returns unicode characters instead of the actual value
https://github.com/w3c/rdf-tests/blob/gh-pages/nquads/literal_ascii_boundaries.nq

Things that need to be documented

Dgraph’s parser transforms numbers and the letters “T, F” to boolean. That’s RDF compliant.

See RDF 1.1 Concepts and Abstract Syntax at section “3.4 Literals”.

{
  set {
    _:node0 <LiteralBool> "1"^^<xs:boolean> .
    _:node1 <LiteralBool> "F"^^<xs:boolean> .
    _:node2 <LiteralBool> "T"^^<xs:boolean> .
  }
}

Support more Storage Types

Source: RDF 1.1 Concepts and Abstract Syntax

Some missing types in Dgraph (these are XML types, but can be used in RDF datasets. And Dgraph supports XMLSchema types)

xsd:decimal Arbitrary-precision decimal numbers
xsd:time Times (hh:mm:ss.sss…) with or without timezone
xsd:dateTimeStamp Date and time with required timezone
xsd:duration Duration of time
xsd:yearMonthDuration Duration of time (months and years only)
xsd:dayTimeDuration Duration of time (days, hours, minutes, seconds only)
xsd:byte |-128…+127 (8 bit)
xsd:short |-32768…+32767 (16 bit)
xsd:long -9223372036854775808…+9223372036854775807 (64 bit)
xsd:hexBinary Hex-encoded binary data
xsd:base64Binary Base64-encoded binary data
xsd:anyURI Absolute or relative URIs and IRIs
MailTo type convert <mailto:e.miller123(at)example> to string after mailto:.
22-rdf-syntax-ns#type - Converts the predicate URI to dgraph.type.
e.g
_:dave <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
To _:dave <dgraph.type> "http://xmlns.com/foaf/0.1/Person" (xid_type="http://www.w3.org/1999/02/22-rdf-syntax-ns#type") .

N-Quads

The terminology N-Quads is different from what it means in W3C docs references.

N-Quads - Named Graphs in N-Triples

This document defines N-Quads, an easy to parse, line-based, concrete syntax for RDF Datasets [RDF11-CONCEPTS].

N-quads statements are a sequence of RDF terms representing the subject, predicate, object and graph label of an RDF Triple and the graph it is part of in a dataset. These may be separated by white space (spaces #x20 or tabs #x9 ). This sequence is terminated by a ’ . ’ and a new line (optional at the end of a document). RDF 1.1 N-Quads

This is an N-quad:

<subject> <predicate> <object> <context>
#or
_:subject1 <http://an.example/predicate1> "object1" <http://example.org/graph1> .

But this isn’t supported in Dgraph’s parser. So it should be called N-Triples. Not N-Quads. I know that Dgraph has Facets. But Facets aren’t Graph context. The last value in an N-Quad is to represent the Graph that these triples belongs to. Facets are just additional information to that edge not about multiple Graphs in Dgraph. We don’t support that kind of reference for now in the parser, however totally possible.

So Dgraph uses N-Triple actually.

Topic		Replies	Views
RDF N-Quads are not absolute Dgraph kind:question	1	486	December 1, 2021
Adding RDF triples in turtle format Users	2	548	June 8, 2020
Docs mention n-quads but not named graphs. Are there named graphs? Dgraph	3	1205	February 11, 2019
Support JSON-LD in Dgraph Dev rfc , rdf , json	2	2470	May 16, 2021
Clarification on using RDF with a Dgraph GraphQL Schema Dgraph	1	806	November 17, 2021