Unbalanced disk usage

dhagrow · February 3, 2020, 4:56pm

Hello, I’d like to better understand how Dgraph distributes data across shards.

I have 6 Alpha shards on 6 separate machines. I also have Zero and Ratel running on a separate machine. My data is mostly a constant stream of new nodes of many types, with occasional updates to just one node type.

Here is the current distribution of data across the shards, with ~6m nodes:

9.6G    data/dgraph/p
47M     data/dgraph/w

509M    data/dgraph/p
69M     data/dgraph/w

521M    data/dgraph/p
74M     data/dgraph/w

7.3G    data/dgraph/p
68M     data/dgraph/w

443M    data/dgraph/p
83M     data/dgraph/w

895M    data/dgraph/p
72M     data/dgraph/w

As you can see, 2 of them have significantly more data than the others. IIUC, the data is split off to different shards based on predicates, correct? Is there an easy way to determine what is being stored on each shard? What would be the best way to get the data more evenly distributed? Do I need to split the larger predicates into subpredicates?

My other concern is that Dgraph is using far more space I expected. My raw data (json) is <5gb. Could this be because I am indexing too many predicates?

Thanks for your help.

dhagrow · February 6, 2020, 3:50pm

If anyone finds this, I found some answers on my own in this thread: Splitting predicates into multiple groups.

dmai · February 6, 2020, 5:15pm

Dgraph will do it’s best to rebalance the predicates (every --rebalance-interval) based on data size. The data and indices for a predicate would be part of the same Alpha group. Indices are stored on disk, so you’ll see more disk usage with indices.

You can check Zero’s /state to see which predicates are part of a particular group: https://docs.dgraph.io/deploy/#more-about-state-endpoint

system · March 7, 2020, 5:15pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cluster Setup - Deploy Documentation	1	632	June 24, 2023
Shard zero and alpha nodes in a federated flavour Dgraph kind:question	2	488	August 29, 2020
Dgraph Scalability Users	4	478	January 6, 2020
Bulk uploader not making equal shards Dgraph dgraph	9	860	March 16, 2022
Alpha nodes stuck in "opPredMove" Dgraph Cloud / Slash GraphQL	4	537	August 15, 2024

Unbalanced disk usage

Related topics