XID in other graph DBs

michaelcompton · June 14, 2017, 11:58pm

I did a quick survey of what other Graph DBs are doing with ID and allowing external IDs and if they handle RDF style URIs, e.g. <http://dbpedia.org/resource/Category:German_money_launderers>.

The short summary is that RDF stores allow URI style IDs, but largely these aren’t distributed stores (though they must some how be handling the mapping to an internal ID if they allow concurrent writes), while non-RDF stores, like Neo4j, in general don’t allow external IDs for nodes.

Neo, for example, has some recent material on importing RDF (http://connected-data.london/2016/06/09/importing-rdf-data-neo4j/), in which they add a string edge to a node to store its URI - same as I think we are thinking of. This is quick, but doesn’t have any handling of uniqueness of nodes for that URI: i.e. concurrent writes could mint two nodes with same URI. I’m not sure how much of a problem this is though because I’d expect that it’s bulk uploads of existing RDF that matter most, so if we handle that client side that’s ok.

Some noSQL+graph stores allow keys in the data and thus have some mechanisms for dealing with concurrent writes: e.g. JanusGraph allows turning off consistency checking during bulk uploads so they don’t have to check the keys.

Graph DBs I looked at :

Neo4j https://neo4j.com/

distributed graph DB
Own data model and query language
No external allocation of ID
Have done RDF by assigning the UID as a property http://connected-data.london/2016/06/09/importing-rdf-data-neo4j/ - but there’s no checking that the UIDs are uniquely attached to nodes.

AllegroGraph https://franz.com/agraph/allegrograph/

RDF quad store
single machine - but also Federation (“When a user creates an AllegroGraph federated repository, a virtual index of the constituent stores is created and maintained in the client session…”)
SPARQL queries

IBM Graph https://www.ibm.com/bb-en/marketplace/graph

built over TinkerPop

GraphBase http://graphbase.net/

distributed graph databases
own structure (Graph Simple Form)
each vertex 128 bit ID
builtin query API (?)
can layer RDF ontop using Jena - not clear what the encoding is. Assume no XIDs
office in same building as Dgraph?

BlazeGraph https://www.blazegraph.com/

RDF quad store
single machine to 50B quads
SPARQL

Cray Graph Engine http://www.cray.com/products/analytics/cray-graph-engine

RDF store
Single machine on Cray hardware
SPARQL

OntoText GraphDB https://ontotext.com/products/graphdb/

RDF store
Enterprise version is distributed
Master-worker distribution with single DB copied on all instances.

ArangoDB https://www.arangodb.com

Distributed graph/noSQL DB
data sharded across DB nodes in the cluster
SQL like query lanaguage
graphs built on their noSQL infrastructure (?) all on RocksDB
each document indexed by a key - can be user specified (?)

JanusGraph http://janusgraph.org/

distributed graph database
has an option to turn off ID checking in batch loading
seems like there is an ID manager that allocates blocks of 64bit IDs out to instances
vertex based storage with random sharding across storage backends
no external id allocation, but seems to have other keys (?).

StarDog http://www.stardog.com/

RDF store
SPARQL query
clustering with master server distribution with Apache ZooKeeper
replicated store with 2PC coordination for distributed writes

OrientDB http://orientdb.com/orientdb/

noSQL + graph DB
SQL like queries
distribution: multimaster and sharding
not sure how ID and keys work

DataStax Enterprise Graph https://www.datastax.com/products/datastax-enterprise-graph

Graph DB over Apache Cassadra
TinkerPop/Gremlin
not clear how IDs and keys work in the graph

cayley https://cayley.io/

? don’t know, docs not there?

michaelcompton · June 15, 2017, 12:21am

another note on URIs in RDF and their use in things like linked data and schema.org etc

URIs are often slow moving things. They are often created and managed by humans. The kinds of URI from create edges to store xid while importing data · Issue #1047 · dgraph-io/dgraph · GitHub or in the linked data web or on schema.org aren’t created on the fly by a machine. They are minted by humans, agreed on by consensus and bulk uploaded into a machine. Even when we do mint a URI on the fly, the process to ensure that it’s unique generally happens outside the DB or is based on some data property of the node that’s meant to be unique.

For example a use of RDF might be to have an existing schema with URI that’s bulk uploaded, say about people, and then data about individuals is added/modified on the fly, the people themselves don’t need URIs - they can be blank nodes. Even if for a particular application you want them to be proper URIs, the process that guarantees uniqueness of them has to be external to the triple store’s node ID handling anyway - e.g. a process that mints different URIs for two people with the same name.

So we don’t really miss out on anything in the sense of using RDF if we don’t have XIDs

mrjn · June 15, 2017, 12:37am

Thanks for the analysis, @michaelcompton. That’s pretty informed.

Yeah, I think XIDs don’t fit into our architecture – we’re not really a triple store, we just chose to use the RDF format for data input. Over time, that might evolve into a more JSON-y way to import data, if that’s easier for developers.

We can bake XID support into our client; so RDFs can still be imported into Dgraph – sounds like that should be sufficient for now.

I know that’s what Neo4j website says, but are they really? I think it’s more like replicas holding entire DB copies. Vertical scaling, not horizontal.

michaelcompton · June 15, 2017, 12:41am

Yes, sorry it’s master-slave replication of a single DB.

michaelcompton · November 28, 2017, 12:59am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
External ID Functionality Dgraph	2	502	May 9, 2020
Confused with the external id Dgraph kind:question	2	1041	March 25, 2021
Understanding bulk data loads, and bulk updates, with XID in v0.8 Users	2	850	November 1, 2017
External IDs and Indices with Concurrent Transactions Dgraph mutation	1	589	May 12, 2020
Support for user provided identifiers Users	10	831	April 18, 2018

XID in other graph DBs

Related topics