I have a situation where each user will have a million node graph of their own which possibly connects to other users’ graphs. I want to implement graph belonging to each user as separate graph in DGraph. How many such individual graphs can DGraph installation can allow if the users are in billions? I have this type of usage because work on individual user’s graph is far more than work on graph connecting multiple users.
Also, what is the Big O for finding cycles in the graph? What algorithm is used?
I’d recommend having just one instance of Dgraph and putting all your data there. Dgraph’s performance shouldn’t decrease as a result of clustering effect. But, if you can explain the kind of queries you want to run, I can give a better idea about whether this cluster effect would affect query performance.
We currently don’t detect cycles in graphs. That’s something planned for later.
Thanks Manish! If it performs well for what i want then that is a good enough way forward.
I’m going to have to find possibility of loop before connecting any node to any node with a uni-directed edge. The graph is going to be very complex, though there is simplicity in edge type(single type). The only query i will have to do at present is for getting all the paths a node is on.
Future queries will be about what nodes a subgraph(belonging to a user) is connected to.
One thing why i thought about maintaining separate graph for each user is that i can spawn threads to make queries on multiple users and determine what i want in parallel.
Another question i have is, is there a maximal size or advantageous size to storage attached to each instance of compute in a cluster running DGraph?
Are you saying there’s only one predicate? If that’s the case, then all your data would lie on a single machine.
You can do that even now with Dgraph. It supports concurrent execution and using more threads on a multi-core server should give you better throughput, up to a certain point.