I did a quick survey of what other Graph DBs are doing with ID and allowing external IDs and if they handle RDF style URIs, e.g. <http://dbpedia.org/resource/Category:German_money_launderers>
.
The short summary is that RDF stores allow URI style IDs, but largely these aren’t distributed stores (though they must some how be handling the mapping to an internal ID if they allow concurrent writes), while non-RDF stores, like Neo4j, in general don’t allow external IDs for nodes.
Neo, for example, has some recent material on importing RDF (http://connected-data.london/2016/06/09/importing-rdf-data-neo4j/), in which they add a string edge to a node to store its URI - same as I think we are thinking of. This is quick, but doesn’t have any handling of uniqueness of nodes for that URI: i.e. concurrent writes could mint two nodes with same URI. I’m not sure how much of a problem this is though because I’d expect that it’s bulk uploads of existing RDF that matter most, so if we handle that client side that’s ok.
Some noSQL+graph stores allow keys in the data and thus have some mechanisms for dealing with concurrent writes: e.g. JanusGraph allows turning off consistency checking during bulk uploads so they don’t have to check the keys.
Graph DBs I looked at :
Neo4j https://neo4j.com/
- distributed graph DB
- Own data model and query language
- No external allocation of ID
- Have done RDF by assigning the UID as a property http://connected-data.london/2016/06/09/importing-rdf-data-neo4j/ - but there’s no checking that the UIDs are uniquely attached to nodes.
AllegroGraph https://franz.com/agraph/allegrograph/
- RDF quad store
- single machine - but also Federation (“When a user creates an AllegroGraph federated repository, a virtual index of the constituent stores is created and maintained in the client session…”)
- SPARQL queries
IBM Graph https://www.ibm.com/bb-en/marketplace/graph
- built over TinkerPop
GraphBase http://graphbase.net/
- distributed graph databases
- own structure (Graph Simple Form)
- each vertex 128 bit ID
- builtin query API (?)
- can layer RDF ontop using Jena - not clear what the encoding is. Assume no XIDs
- office in same building as Dgraph?
BlazeGraph https://www.blazegraph.com/
- RDF quad store
- single machine to 50B quads
- SPARQL
Cray Graph Engine http://www.cray.com/products/analytics/cray-graph-engine
- RDF store
- Single machine on Cray hardware
- SPARQL
OntoText GraphDB https://ontotext.com/products/graphdb/
- RDF store
- Enterprise version is distributed
- Master-worker distribution with single DB copied on all instances.
ArangoDB https://www.arangodb.com
- Distributed graph/noSQL DB
- data sharded across DB nodes in the cluster
- SQL like query lanaguage
- graphs built on their noSQL infrastructure (?) all on RocksDB
- each document indexed by a key - can be user specified (?)
JanusGraph http://janusgraph.org/
- distributed graph database
- has an option to turn off ID checking in batch loading
- seems like there is an ID manager that allocates blocks of 64bit IDs out to instances
- vertex based storage with random sharding across storage backends
- no external id allocation, but seems to have other keys (?).
StarDog http://www.stardog.com/
- RDF store
- SPARQL query
- clustering with master server distribution with Apache ZooKeeper
- replicated store with 2PC coordination for distributed writes
OrientDB http://orientdb.com/orientdb/
- noSQL + graph DB
- SQL like queries
- distribution: multimaster and sharding
- not sure how ID and keys work
DataStax Enterprise Graph https://www.datastax.com/products/datastax-enterprise-graph
- Graph DB over Apache Cassadra
- TinkerPop/Gremlin
- not clear how IDs and keys work in the graph
cayley https://cayley.io/
- ? don’t know, docs not there?