"really" large datasets in dgraph

larswunderlich · May 24, 2019, 1:16pm

Hi everyone,

I’m searching for experiences of running dgraph nodes in a typical cloud environment to figure out, whether dgraph is the right tool for me. For sure, dgraph calls itself distributed, Open Source and production ready. However, I’d like to run it on an AWS cluster environment, preferred in combination with k8s and a dataset of at least a few billion nodes potentially distributed across this cluster.
Is there anyone who has already setup something similar (maybe for testing purposes) on EC2 or EKS environments or tackled problems like backups or up- and downscaling? I would suppose, that if it is true, that dgraph scales better than other Open Source solution, I would expect that some papers, tutorials or experiences might give me some hints. I’m already lost in choosing the right EC2 instance type/size, an appropriate number of instances or the optimal way of setting up a large, resilient cluster. The question is: is it worth the work at all?

What is the largest cluster you’ve run in a distributed environment and what kind of pitfalls (probably running it in a cloud environment) came up? Any dox available somewhere?

Any thoughts might be helpful. Thanks in advance.

selmeci · May 25, 2019, 7:55am

Hi Lars.

What do you mean by “really” large datasets?
We’re running two cluster of dGraph - one for prod and one for development. Each of them is run on EC2 servers. One with 5 node and one with 3 nodes - i3en.6xlarge. It runs smoothly (~2-3 billions of edges) but we have not done any up- and downscaling untli now. But you must consider density of your data. If you have one big predicate with tons of data, it is store only on one server (sharding is not supported yet). In this case you have to run bigger EC2 instance.

Topic		Replies	Views
Is anyone using dgraph in production? Dgraph	21	7178	January 23, 2021
Deploying Dgraph for 5 billion nodes and 20 billion edges Dgraph dgraph	4	912	July 24, 2020
Benchmarks and companies using dgraph Dgraph	5	1846	May 4, 2018
Just Curious. Maximum Production Scale & Analytical Processing? Users	1	379	November 13, 2019
DGraph in healthcare Dgraph Cloud dgraph	3	645	December 7, 2021

"really" large datasets in dgraph

Related topics