『Dgraph colocates data per predicate (* P *, in RDF terminology), thus the smallest unit of data is one predicate. To shard the graph, one or many predicates are assigned to a group.』
i want to know whether has the plan which split Partitioning granularity to imporve the throughput of the dgraph cluster.
we’re pretty data-oriented in this company. Do we have any evidence that throughput of data stored in one format is slower than another?
The main reason why the data is colocated by predicate is because edgelists are a well balanced data structure for the purposes of a graph database - it has a O(1) addition of nodes and edges, it has O(|E|) storage requirements and has a O(|E|/S) query time where S is the number of shards. Thus it can find nodes very quickly. And by that logic, if the data is colocated, then it fetches the data associated with a node quickly as well.
But if you have evidence that it would negatively impact throughput, then please share it. We would love to consider alternatives. But at Dgraph, data talks.