Why Dgraph is not a Triple Store

A triple store, also known as a RDF store(RDF graph database), is a specialized database designed for storing and querying data in the Resource Description Framework (RDF) format. The RDF format is used to represent data as a set of subject-predicate-object triples, also known as RDF triples.

Commonly, users mistake Dgraph for a Triple Store, however, it is a misconception. The confusion arises from the fact that Dgraph supports RDF. And it is worth noting that Dgraph does not support all types of RDF out there. We have a custom RDF format, which I usually refer to as “RDF-D”. It consists of NQuads and now includes an additional value denoting namespaces.

Dgraph is a distributed graph database that uses an inverted index, also known as a Posting List, to index and search data efficiently. The inverted index is a powerful indexing technique used in search and information retrieval systems that allows for the quick retrieval of documents that contain specific search terms in large datasets.

In Dgraph, the inverted index is used to index the data stored in the graph database. Each edge in the graph is indexed based on the values of its properties, and the inverted index is used to efficiently locate nodes and edges that match a given query. The use of the inverted index in Dgraph allows for fast and efficient search of graph data.

In the bottom line Dgraph stores all in KV using badger.

Some key characteristics of a triple store:

  1. RDF Storage: A triple store is designed specifically for storing and managing RAW RDF data. It can store large amounts of RDF triples, and provides efficient ways to query and retrieve the data.
  2. SPARQL Query Language: A triple store typically supports the SPARQL query language, which is used to query RDF data. SPARQL allows users to search for specific patterns of RDF triples and retrieve the results in a structured format.
  3. RDF Schema and OWL Support: A triple store may also support RDF Schema (RDFS) and Web Ontology Language (OWL), which are used for defining ontologies and class hierarchies in RDF data.
  4. Standards-Based: A triple store adheres to standard specifications for RDF, RDFS, and OWL, ensuring compatibility with other RDF-based systems and tools.

Some of these can be achieved with Dgraph with a proper support. As the case of OWL, complex but plausible. Or try to support SPARQL if it could run without the RDF header and other RDF stuff.

A triple store is an important tool in the semantic web and knowledge management domains. Maybe there are other key of importance in Triple Store. But I think that storing RDF as it is, is one of the most important ones. Which isn’t Dgraph’s case.

Cheers.

Does the KV store use a double key index or single key?

Single key index. You can see more about it in the paper.

I guess that is exactly where the confusion is at then, the paper:

2.1 Data Format
Dgraph can input data in a JSON format or (slightly modified)
RDF NQuad format. Dgraph would break down a JSON map
into smaller chunks, with each JSON key-value forming one
record equivalent of a single RDF triple record…

A triple is typically expressed as a subject-predicate-object
or a subject-predicate-value. Subject is a node, predicate is a
relationship, and object can be another node or a primitive data
type. One points from a node to another node, the other points
from a node to a value. In the above example, the triple with
name is a type of subject-predicate-value (typically referred
to as an attribute), while the triple with follower is a type of
subject-predicate-object. Dgraph makes no difference in how
it handles these two types of records (to avoid confusion over
these two types, we’ll refer to them as object-values). Dgraph
considers this as the unit of record and a typical JSON map
would be broken into multiple such records.

2.2 Data Storage
Dgraph data is stored in an embeddable key-value database
called Badger [?] for data input-output on disk…

As mentioned above, all records with the same predicate
form one shard. Within a shard, records sharing the same
subject-predicate are grouped and condensed into one single
key-value pair in Badger. This value is referred to as a posting
list, a terminology commonly used in search engines to refer
to a sorted list of doc ids containing a search term. A posting
list is stored as a value in Badger, with the key being derived
from subject and predicate.

I guess I always called it a triple store because of the paper and interpretation of it also by Manish, the founder, in this very forum.

So while it is not a triple store because it stores the subject and predicate concatenated as the key, and the posting list as the value in the KV store, it makes sense to still consider it a triple store in many aspects to understand why it does what it does and how it works under the hood. I think I have a good understanding of Dgraph and much of my understanding is based upon the thought of Dgraph is a triple store.

But you are correct that on the disk there is not a subject-object-predicate as a triple, but rather a “subject-object”-<ordered posting list containing all objects>. So at this point, its just semantics, but I bow to KV.

The big difference is in the way we keep the data. A Triple Store keeps the RDF format on disk. We break into pieces and spread it.

In an abstract way we can say that Dgraph deals with Triples, obviously. But he doesn’t “stores” it as TS does. And there are several aspects that differentiate us from an old-fashioned Triple Store.