Yes. We’ve been impressed with dgraph’s performance at scale- but things get stressful once the cluster grows large enough. Certainly don’t take my tone as disingenuous, dgraph is young and engineering tradeoffs have to be made, but here’s what we run up against:
No sharding of predicates
This is a big concern for us- as it means if a predicate becomes large enough, the only way to deal with it is vertically scaling. (We have predicates like xid
and name
that have over 100 million entries, and are over 100gb in size each).
The shard-balancer wrecks so much havoc on these large predicates that we had to completely disable it. It moves things too frequently, and its only criteria is size (instead of “hotness”). Any time one of these large predicates is moved (or indexed), the cluster grinds to a halt because incoming data gets stuck behind the operation- so much so that if we have to move a predicate, we have to pause all of our ingestion pipelines and wait for dgraph to catch up.
No query planner
Dgraph makes naive decisions on how a query gets executed- for instance, if filtering by two predicate A and B in a query,
- A with 1 billion unique UIDs, with an index
- B with 10 unique UIDs, without an index,
dgraph will choose A over be if it sees that A is indexed, but B would have bounded the query faster. Its frustrating when end users of our system have to do things like write their query in reverse to get better performance.
Related, cascade
is one of dgraph’s most powerful features, but without a smarter query planner, deep/wide cascades can be unusably slow. This is frustrating since the whole reason you chose a graph db is because you want cheap inner-joins
Im excited for the roaring bitmaps work, as I could see that improving some of these issues.
Upsert by xid painful for ingest-heavy workloads
All of our inserts happen with deterministic ids- but since dgraph creates its own uid for everything, we are forced to have every insert be an upsert. (Query by xid
, if it doesn’t exist, add it, otherwise update it). This puts pressure on the cluster that I wish we had a way to avoid. We want to be able to do blind-writes.
We’d be happy to hash our ids in a way that they became dgraph uint64 uids (that we could insert directly), but it feels like dgraph is not intended to work this way (the way the zeros lease blocks of uids, them being contiguous and sequential, etc)
Feature Wishlist
All of the data we store in dgraph is temporal, so we’ve had to do some gnarly gymnastics to allow users to filter nodes/edges by spans of time (we have dangling “timerange” nodes that hang off of every node in the graph… which we use as a filtering mechanism via cascade
).
I would be over the moon is dgraph had a native timerange
type that was just [starttime, endtime]
. This would allow us to put a list of timeranges directly on a node, and then a query like intersects(100,200)
return us a node that had a timerange predicate like [0,110], [180, 300]
. This would reduce the stress we put on the cluster across the board.
Dgraph has a strong foundation, and I know the team has ideas about the issues I’ve brought up. To echo some of the other commenters, I am more interested in dgraph as a graph databse than a gql platform, given dgraph is the only modern, cloud-native, graphdb in town.