Reason for this RFC
Providing support closer to native RDF opens up the possibility for other users to bring their datasets to Dgraph, and facilitating their move out as well. This approach is friendly.
It also opens up the possibility for users with RDF datasets to take advantage of Dgraph’s GraphQL out-of-the-box. With zero code and effort. They just need to know well their dataset model.
Note: see that the W3C team wishes to use GraphQL GitHub - w3c/graphql-rdf: Bridging GraphQL and RDF Community Group
It is worth mentioning that Dgraph is not able to support specific normative of the RDF standard. Things like namespacing, facts, XML serialization syntax, Turtle syntax, and other structures are beyond what Dgraph can do today. Perhaps in the future, some of these structures can be useful.
Quick topics related to this RFC
- Dgraph now supports storing XIDs on the node. We should support also exporting the RDF preserving the previous blank node. Based on the XIDs stored.
- Deal better with Unrecognized RDF types
- Add aliases at the schema level (In type)
- This one is related to JSON-LD, but is potentially an issue for RDF too Support JSON-LD in Dgraph
- And Support reserved character "@" to be used in predicate naming
- This validator need to be RDF compliant Add a validator for bulk and live loader
Source
Issues with Blank Nodes.
In nquads tests. The blank node format <http://example/s>
(this is a URI blank node) won’t work. It is necessary to change it manually (and add a Blank node label) to:
_:http:example:s
or
<_:http://example:s>
In short, the RDF standard accepts blank nodes within angular brackets. However, Dgraph accepts only uid64 or if it has the label _:
- this can be a problem if we use the flag --store_xids
because Dgraph will store with the prefix that is not supported in the same way in the RDF standard.
https://github.com/w3c/rdf-tests/blob/gh-pages/nquads/comment_following_triple.nq
One error in the parser related to URIs
{
set {
<_:http://a.example/s> <http://example/test> <scheme:!$%25&'()*+,-./0123456789:/@ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz~?#> .
}
}
“message”: “strconv.ParseUint: parsing "(…).(…)(…)”: invalid syntax"
The parser thinks it is a literal UID. Not a blank node.
https://github.com/w3c/rdf-tests/blob/gh-pages/nquads/nt-syntax-uri-04.nq
Issues with Language support
No issues found with RDF itself. But if the user don’t set the Schema before inserting the data, the language support can be an issue.
Language tags can only be used with predicates of string type having
@lang directive in schema. Got: [http://example.org/ex#b]"
So, Dgraph should infer the lang tag and add it to the schema as it does for some other cases like lists.
{
q(func: has(<http://example.org/ex#b>@en-UK)) {
uid
<http://example.org/ex#b>@en-UK
}
}
{
"data": {
"q": [
{
"uid": "0x2717",
"http://example.org/ex#b@en-UK": "Cheers"
}
]
}
}
NOTE: Also, I’m not sure if sub language tags are mentioned on docs. And also about how to query lan tags with angular brackets.
Parser issues
This test bellow doesn’t return the correct value in the string test.
https://github.com/w3c/rdf-tests/blob/gh-pages/nquads/literal_all_controls.nq
Ascii_boundaries
This test bellow returns unicode characters instead of the actual value
https://github.com/w3c/rdf-tests/blob/gh-pages/nquads/literal_ascii_boundaries.nq
Things that need to be documented
Dgraph’s parser transforms numbers and the letters “T, F” to boolean. That’s RDF compliant.
See RDF 1.1 Concepts and Abstract Syntax at section “3.4 Literals”.
{
set {
_:node0 <LiteralBool> "1"^^<xs:boolean> .
_:node1 <LiteralBool> "F"^^<xs:boolean> .
_:node2 <LiteralBool> "T"^^<xs:boolean> .
}
}
Support more Storage Types
Source: RDF 1.1 Concepts and Abstract Syntax
Some missing types in Dgraph (these are XML types, but can be used in RDF datasets. And Dgraph supports XMLSchema types)
-
xsd:decimal
Arbitrary-precision decimal numbers -
xsd:time
Times (hh:mm:ss.sss…) with or without timezone -
xsd:dateTimeStamp
Date and time with required timezone -
xsd:duration
Duration of time -
xsd:yearMonthDuration
Duration of time (months and years only) -
xsd:dayTimeDuration
Duration of time (days, hours, minutes, seconds only) -
xsd:byte
|-128…+127 (8 bit) -
xsd:short
|-32768…+32767 (16 bit) -
xsd:long
-9223372036854775808…+9223372036854775807 (64 bit) -
xsd:hexBinary
Hex-encoded binary data -
xsd:base64Binary
Base64-encoded binary data -
xsd:anyURI
Absolute or relative URIs and IRIs - MailTo type convert
<mailto:e.miller123(at)example>
to string aftermailto:
. - 22-rdf-syntax-ns#type - Converts the predicate URI to dgraph.type.
e.g
_:dave <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
To_:dave <dgraph.type> "http://xmlns.com/foaf/0.1/Person" (xid_type="http://www.w3.org/1999/02/22-rdf-syntax-ns#type") .
N-Quads
The terminology N-Quads
is different from what it means in W3C docs references.
This document defines N-Quads, an easy to parse, line-based, concrete syntax for RDF Datasets [RDF11-CONCEPTS].
N-quads statements are a sequence of RDF terms representing the subject, predicate, object and graph label of an RDF Triple and the graph it is part of in a dataset. These may be separated by white space (spaces#x20
or tabs#x9
). This sequence is terminated by a ’.
’ and a new line (optional at the end of a document). RDF 1.1 N-Quads
This is an N-quad:
<subject> <predicate> <object> <context>
#or
_:subject1 <http://an.example/predicate1> "object1" <http://example.org/graph1> .
But this isn’t supported in Dgraph’s parser. So it should be called N-Triples. Not N-Quads. I know that Dgraph has Facets. But Facets aren’t Graph context. The last value in an N-Quad is to represent the Graph that these triples belongs to. Facets are just additional information to that edge not about multiple Graphs in Dgraph. We don’t support that kind of reference for now in the parser, however totally possible.
So Dgraph uses N-Triple actually.