Increase RDF compliance support (N-Triples)

Reason for this RFC

Providing support closer to native RDF opens up the possibility for other users to bring their datasets to Dgraph, and facilitating their move out as well. This approach is friendly.

It also opens up the possibility for users with RDF datasets to take advantage of Dgraph’s GraphQL out-of-the-box. With zero code and effort. They just need to know well their dataset model.

Note: see that the W3C team wishes to use GraphQL GitHub - w3c/graphql-rdf: Bridging GraphQL and RDF Community Group

It is worth mentioning that Dgraph is not able to support specific normative of the RDF standard. Things like namespacing, facts, XML serialization syntax, Turtle syntax, and other structures are beyond what Dgraph can do today. Perhaps in the future, some of these structures can be useful.

Quick topics related to this RFC

Source

Issues with Blank Nodes.

In nquads tests. The blank node format <http://example/s> (this is a URI blank node) won’t work. It is necessary to change it manually (and add a Blank node label) to:
_:http:example:s
or
<_:http://example:s>

In short, the RDF standard accepts blank nodes within angular brackets. However, Dgraph accepts only uid64 or if it has the label _: - this can be a problem if we use the flag --store_xids because Dgraph will store with the prefix that is not supported in the same way in the RDF standard.

https://github.com/w3c/rdf-tests/blob/gh-pages/nquads/comment_following_triple.nq

One error in the parser related to URIs

{
  set {
    <_:http://a.example/s> <http://example/test> <scheme:!$%25&'()*+,-./0123456789:/@ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz~?#> .

  }
}

“message”: “strconv.ParseUint: parsing "(…).(…)(…)”: invalid syntax"

The parser thinks it is a literal UID. Not a blank node.

https://github.com/w3c/rdf-tests/blob/gh-pages/nquads/nt-syntax-uri-04.nq


Issues with Language support

No issues found with RDF itself. But if the user don’t set the Schema before inserting the data, the language support can be an issue.

Language tags can only be used with predicates of string type having 
@lang directive in schema. Got: [http://example.org/ex#b]"

So, Dgraph should infer the lang tag and add it to the schema as it does for some other cases like lists.

{
      q(func: has(<http://example.org/ex#b>@en-UK)) {
        uid
       <http://example.org/ex#b>@en-UK
      }
}
{
"data": {
    "q": [
      {
        "uid": "0x2717",
        "http://example.org/ex#b@en-UK": "Cheers"
      }
    ]
  }
}

NOTE: Also, I’m not sure if sub language tags are mentioned on docs. And also about how to query lan tags with angular brackets.


Parser issues

This test bellow doesn’t return the correct value in the string test.
https://github.com/w3c/rdf-tests/blob/gh-pages/nquads/literal_all_controls.nq

Ascii_boundaries

This test bellow returns unicode characters instead of the actual value
https://github.com/w3c/rdf-tests/blob/gh-pages/nquads/literal_ascii_boundaries.nq

Things that need to be documented

Dgraph’s parser transforms numbers and the letters “T, F” to boolean. That’s RDF compliant.

See RDF 1.1 Concepts and Abstract Syntax at section “3.4 Literals”.

{
  set {
    _:node0 <LiteralBool> "1"^^<xs:boolean> .
    _:node1 <LiteralBool> "F"^^<xs:boolean> .
    _:node2 <LiteralBool> "T"^^<xs:boolean> .
  }
}

Support more Storage Types

Source: RDF 1.1 Concepts and Abstract Syntax

Some missing types in Dgraph (these are XML types, but can be used in RDF datasets. And Dgraph supports XMLSchema types)

  • xsd:decimal Arbitrary-precision decimal numbers
  • xsd:time Times (hh:mm:ss.sss…) with or without timezone
  • xsd:dateTimeStamp Date and time with required timezone
  • xsd:duration Duration of time
  • xsd:yearMonthDuration Duration of time (months and years only)
  • xsd:dayTimeDuration Duration of time (days, hours, minutes, seconds only)
  • xsd:byte |-128…+127 (8 bit)
  • xsd:short |-32768…+32767 (16 bit)
  • xsd:long -9223372036854775808…+9223372036854775807 (64 bit)
  • xsd:hexBinary Hex-encoded binary data
  • xsd:base64Binary Base64-encoded binary data
  • xsd:anyURI Absolute or relative URIs and IRIs
  • MailTo type convert <mailto:e.miller123(at)example> to string after mailto:.
  • 22-rdf-syntax-ns#type - Converts the predicate URI to dgraph.type.
    e.g
    _:dave <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
    To _:dave <dgraph.type> "http://xmlns.com/foaf/0.1/Person" (xid_type="http://www.w3.org/1999/02/22-rdf-syntax-ns#type") .

N-Quads

The terminology N-Quads is different from what it means in W3C docs references.

N-Quads - Named Graphs in N-Triples

This document defines N-Quads, an easy to parse, line-based, concrete syntax for RDF Datasets [RDF11-CONCEPTS].

N-quads statements are a sequence of RDF terms representing the subject, predicate, object and graph label of an RDF Triple and the graph it is part of in a dataset. These may be separated by white space (spaces #x20 or tabs #x9 ). This sequence is terminated by a ’ . ’ and a new line (optional at the end of a document). RDF 1.1 N-Quads

This is an N-quad:

<subject> <predicate> <object> <context>
#or
_:subject1 <http://an.example/predicate1> "object1" <http://example.org/graph1> .

But this isn’t supported in Dgraph’s parser. So it should be called N-Triples. Not N-Quads. I know that Dgraph has Facets. But Facets aren’t Graph context. The last value in an N-Quad is to represent the Graph that these triples belongs to. Facets are just additional information to that edge not about multiple Graphs in Dgraph. We don’t support that kind of reference for now in the parser, however totally possible.

So Dgraph uses N-Triple actually.

1 Like