Whether going to change sharding mechanism to improve the thoughput?

What I want to do

『Dgraph colocates data per predicate (* P *, in RDF terminology), thus the smallest unit of data is one predicate. To shard the graph, one or many predicates are assigned to a group.』

i want to know whether has the plan which split Partitioning granularity to imporve the throughput of the dgraph cluster.

@chewxy @MichelDiz

See Hardik’s answer here Splitting predicates into multiple groups - #13 by eugaia

so is it supported at 2021 or not?
@MichelDiz

Not sure, but looks like it will take time.

hey @cangchen8180,

we’re pretty data-oriented in this company. Do we have any evidence that throughput of data stored in one format is slower than another?

The main reason why the data is colocated by predicate is because edgelists are a well balanced data structure for the purposes of a graph database - it has a O(1) addition of nodes and edges, it has O(|E|) storage requirements and has a O(|E|/S) query time where S is the number of shards. Thus it can find nodes very quickly. And by that logic, if the data is colocated, then it fetches the data associated with a node quickly as well.

But if you have evidence that it would negatively impact throughput, then please share it. We would love to consider alternatives. But at Dgraph, data talks.

2 Likes