Hello, I’d like to better understand how Dgraph distributes data across shards.
I have 6 Alpha shards on 6 separate machines. I also have Zero and Ratel running on a separate machine. My data is mostly a constant stream of new nodes of many types, with occasional updates to just one node type.
Here is the current distribution of data across the shards, with ~6m nodes:
9.6G data/dgraph/p
47M data/dgraph/w
509M data/dgraph/p
69M data/dgraph/w
521M data/dgraph/p
74M data/dgraph/w
7.3G data/dgraph/p
68M data/dgraph/w
443M data/dgraph/p
83M data/dgraph/w
895M data/dgraph/p
72M data/dgraph/w
As you can see, 2 of them have significantly more data than the others. IIUC, the data is split off to different shards based on predicates, correct? Is there an easy way to determine what is being stored on each shard? What would be the best way to get the data more evenly distributed? Do I need to split the larger predicates into subpredicates?
My other concern is that Dgraph is using far more space I expected. My raw data (json) is <5gb. Could this be because I am indexing too many predicates?
Thanks for your help.