We are about to add some reasonably traditional social networking features to our app. Think timelines, fan-out on read newsfeeds, likes, shared posts etc. And as such are currently evaluating backend tech for this.
In the past we have always used several layers for implementing these sorts of features. Elasticsearch/Redis/Hazelcast managing relationships over distributed Big Table based stores like Cassandra/DynamoDB etc. However the idea of being able to do away with that complexity and simply have a single graph database do it all is exactly what what we would like to look at doing going forward.
Current, more mature Graph databases suffer from compromises in one area or another, but the main one being that they are not distributed, or in the case of Titan, not always performant enough to stand alone and provide user facing queries without adding in some caching layer magic.
Dgraph, is ‘exactly’ what we are looking for, at least in vision and concept.
But given the early stages of this project we have a few questions before we start playing with it.
Given that Dgraph is not yet feature-complete, would it be possible to model a simple time-ordered fan-out on read newsfeed of ‘friends’ activities/posts?
Would you expect Dgraph to perform at scale, while retrieving such feeds in a user facing manner without any kind of caching layer in between the app and the database?
Does Dgraph suffer from the ‘supernode’ problem that most other graph databases do? I.e. how does it deal with a single entity that has 1 million+ edges connected to them? Does having so many relationships on a single entity slow down all traversals across that entity? What about different edge/relationship types? If we have 1 million [SHARED] edges on a single node/entity, does it effect the performance of traversing the [LIKED] edges? Titan uses vertex-centric indexes to work around this issue but from what little I’ve read on Dgraph, your data structure prevents this from becoming an issue. Is my assumption correct?
How does the initial starting entity lookup work? Most graph databases allow you to index or add unique constraints to starting entities so that the initial lookup, before traversal, is pretty much instantaneous at any size. How does this work in Dgraph?
I can see that support for limiting and paging results is in place, but what about sorting results by timestamp? How about by UTC date? If not can it be expected soon?
Development on Dgraph appears to be progressing quite quickly. How would the upgrade procedure work? Would our graph data remain intact as we update releases?
Dgraph is designed from the ground up to be distributed, hence our interest. But what does the horizontal scale out procedure look like? I can see that you have auto-discovery on your roadmap, which is great, but what about re-sharding existing data, how would that work as we add new nodes to our cluster?
Sorry for the long post and the huge amount of questions. But we need to balance our enthusiasm with pragmatism, and as such, really understand what we can ‘currently’ do with Dgraph, before investing hours getting to grips with it.
Regardless, we find this project incredibly exciting and will be watching closely.