How to minimize remote edge traversal?


I want to know how data are stored in the database. Can we move vertices between machines? The thing is that for high throughput and good performance, we need to minimize traversal between nodes.
Fox example, if we store 100 friends of a user in a distributed graph database (for example 10 node cluster) and try to get the list of the user’s friends, then is is a high probability that all 10 nodes will be interacting in this query.
That is why throughput of 10 servers will be the same as 1 server and performance will be even terrible because of many hops.
That is why logically we need to minimize graph traversal as much as possible.

Solution of Facebook is here:

I want to know if it is possible to use Apache Giraph and Kernighan–Lin algorithm with Dgraph.

Best regards,

Dgraph does edge-centric sharding, not node-centric sharding.

That’s not the case with Dgraph. In Dgraph, the “friend” edge would always be store on exactly one server (+ replicas); so if you need to query your friends, it would be one RPC call away to that server.

Dgraph is optimized to minimize the number of network calls required to execute arbitrarily complex queries. You can read about the design concepts here:

Giraph is just a batch processing system, not an algorithm. What sort of integration are you looking for?

Thank you very much for the pointing!

I am newbie to the graph databases. I just need a scalable graph database solution to not worry much about scaling.
I wanted to use Apache Giraph for graph rebalancing to minimize network calls. But it seems that Dgraph don’t need it because it minimizes network calls by itself. Am I right?

That’s right. Dgraph is scalable by itself.