Hope to support SPARQL query in 2020

Moved from GitHub dgraph/4487

Posted by wangdsh:

I think SPARQL query is important. The reason can be seen here:


Hope to support SPARQL query in 2020.

marvin-hansen commented :

1000% upvote, SPARQL is super important.

Reasons can be seen here:


suesunss commented :

@marvin-hansen @wangdsh

Agree. Cypher benifits mainly the application layer and easy-to-use, while Gremlin has a richer community that no other graph DBs can compete with as far as I am concerned, Gremlin also has a richer expressivity to perform any arbitrary complex graph traversals as you have mentioned in a previous post. The problem with Gremlin is, as the query goes more complex, it is more difficult to perform ad-hoc/automatic optimizations, which I think is an essential point for any querable databases, optimizations are even more difficult for users without a solid database and graph traversal virtual machine backgound.

But the real origin comes from SPARQL, which has a clean, simple syntax and yet a powerful expressivity. Really hope to see SPARQL support, this will benifit both graph DB and semantic web communities, and probably, have a deeper influence over future web technology.

MichelDiz commented :

The biggest challenge with SPARQL. Is that this language was made to support Triple Store databases which follows W3C standards. Dgraph isn’t an RDF Triple Store despite using RDF (in the simplest and most raw format).

IMHO, it’s kind of chaotic having to maintain the support of different languages with different standards and different requirements/needs.

Imagine the chaos it would be to maintain GraphQL, GraphQL+-, Gremlin, Cypher and SPARQL. On each new Dgraph feature would be herculean work to sync. Hard to synchronize. Even if we support just one, so we should abandon GraphQL+- in favor of the new lang (and redesign Dgraph) that we don’t have control (if we add features to Dgraph, we should ask for the lang maintainer to add to that language specs).

I believe it would be easier for you to select the features you like most in SPARQL (or any other) and ask for support in Dgraph than to add another language. Or even suggest changes in the GraphQL+- syntax.

That’s my two cents as a user

BTW, my opinion doesn’t reflects what Dgraph in general thinks.

Extra example of the difference between Dgraph and SPARQL

SPARQL uses “PREFIX” which is linked to the Identifier that is stored in the RDF store format. This Identifier in Dgraph is converted to UIDs. So, to make this to work. It is necessary to sanitize the dataset. Thus, making incompatible with any other RDF stores (means that when you export the RDF you gonna need to revert the sanitize and also you need to rebuild yourself the Identifiers).

Also, the sanitize would need a “hacky” way to the keyword “PREFIX” to work. And one approach should be defined. e.g use dgraph’s type system or edges to represent the PREFIX inside Dgraph.

marvin-hansen commented :


If Dgraph is neither a RDF Triple Store nor a quad store,
what then is the underlying storage system?

Is everything reduced to key-value in badger?

And what was the discerning criterion to omit the foundation for the semantic web?

I understand with the lacking foundation of a triple or quad store, there is actually very little than can be done to fully support RDF & SPARQL.

What’s the long-term vision of Dgraph?

I had a look at the roadmap, and I appreciate all the effort and dedication making things better, but I am just curious to know where this going in the long-run?

MichelDiz commented :

  1. It is basically Posting Lists recorded in KV on BadgerDB. You can read more about it in the newly released paper https://github.com/dgraph-io/dgraph/blob/master/paper/dgraph.pdf

A RDF triple is basically a “KV” with an identifier.

Dgraph is a triple system, but not exactly a triple store per se.

  1. Yes.

  2. I am not sure. Because I was not present when Dgraph started. But I have a slight idea of ​​why.

Basically Dgraph was “mirroring” itself in GraphQL. And GraphQL had no specific mutation patterns other than JSON objects inside a mutation block. Perhaps the engineers who started the project with Manish had some familiarity with RDF (as they certainly took classes at the university with web semantics). And the RDF seemed to be an obvious choice for that moment. But not the whole package.

I have this slight idea after reading old commits. But I can ask Manish about it.

Anyway, GraphQL does not use web semantics so do we. And there was no demand for this feature. So the DB was maturing without web semantics. Even because, Dgraph is a DB aimed at common web services (like NoSQL is) and not Ontology or similar. Although you can do it, as any GraphDB is customizable. But it would not follow any specific standards. And you have to “fit” it in GraphQL+-.

I understand with the lacking foundation of a triple or quad store, there is actually very little than can be done to fully support RDF & SPARQL.

We can try to support JSON-LD. Which several RDF DBs can export their data. That’s why I have opened some issues about this context https://github.com/dgraph-io/dgraph/issues/4897

And also https://github.com/dgraph-io/dgraph/issues/4898 https://github.com/dgraph-io/dgraph/issues/4915

All these are small steps to let users input data coming from RDF triple stores easily (I am studying the problems related to this). We could import JSON-LD and export a JSON file 99.9% similar to JSON-LD. Which is compatible with several tools out there.

What’s the long-term vision of Dgraph?

We are discussing about it Dgraph's New Versioning Scheme

Do you mean SPARQL? I’m not sure. We have to finish the GraphQL specs support. There are a lot of things to be done to start a new adventure.

marvin-hansen commented :

Thank you.

Basically Dgraph was “mirroring” itself in GraphQL.

I suspected that a few times, but now it’s official. There is no point in supporting SPARQL or JSON-LD given the available foundation and the actual goal. No need to add unnecessary complexity.

Dgraph is a DB aimed at common web services

Please stress that a bit more on the landing page & at the intro of the documentation to make it clear what it is and what’s not.

Thank you for the clarification.

MichelDiz commented :

We can still have totally support for JSON-LD. Cuz users can bring their data in. And if they wish, they can “sanitize” (dunno a better word for this) it using Bulk Upsert mutation. There is nothing hard in JSON-LD that we can’t deal with.