Dgraph High Write Throughput

Hello, Just starting to explore DGraph a bit more. I just loaded the “movie” data set on my local setup (A bigger dataset | Moredata | Dgraph Tour) and got around 5000 updates/sec on my 2015 Macbook pro. I’m sure this isn’t representative of what a “real” cluster could do, but I’m trying to read through the documentation and understand how this system could scale to 100k - 1M writes per second. I have a use case where I’m processing lots of updates to different “entities”…where each entity might have 5-10 attributes and about 5-10 relationships to other entities. Looking through the design, it looks like each attribute/edge (predicate) is a unit of scale for an Alpha, so if I had 10 entities in my system, each with 10 predicates, having more than 100 nodes in my cluster would not increase my write capability.

As a follow up question to this, is it possible to shard/route requests to a subset of nodes? For example, I’ve used Elasticsearch routing like this to restrict a query to a subset of Elasticsearch nodes. I’m wondering if this is possible in DGraph. If it were, that might allow me to ramp up my write throughput by having a different Alpha handle a subset of the same predicate. For example, rather than 1 Alpha handling all requests for the “Name” predicate, if we could shard/partition that alpha into 4, each Alpha would only handle 1/4 of the Name Predicate traffic based on some hash/range key that we can specify in the query.

I suppose another alternative would be creating multiple DGraph clusters, and then I can add routing logic in the application layer. The downside here is that is now I can’t issue queries across clusters since my ids would be different in each cluster, but that’s not a deal breaker if I can increase my total writes across each separate cluster.

1 Like