Predicate Rebalances (Moves) Across Servers Cause Transactions to Abort

rahst12 · July 19, 2024, 9:43pm

When a predicates rebalances (moves) across groups/servers, mutation transactions abort at a significantly higher rate.

Scenario:

A server is sending a continuous live data flow into dgraph with a predicateA, predicateB, and predicateC.
The auto-balance interval is hit, selecting predicateB to move groups.
predicateB begins moving.
Live ingest of predicateA, predicateB, and predicateC begin having transaction abort errors while the predicate is being moved.

In this example, predicateB is 23GBs, so it can take 10+ minutes to move it.

Should the rebalance of predicates be transparent to the rest of the system? Specifically, can a predicate be written to at scale/volume when that predicate is being rebalanced off the group/server?

Thanks,
Ryan

rahst12 · August 15, 2024, 11:13pm

I found the answer my question reading through this post:

Shard rebalancing

Dgraph Zero tries to rebalance the cluster based on the disk usage in each group. If Zero detects an imbalance, it will try to move a predicate along with its indices to a group that has lower disk usage. This can make the predicate temporarily read-only. Queries for the predicate will still be serviced, but any mutations for the predicate will be rejected and should be retried after the move is finished.

Zero would continuously try to keep the amount of data on each server even, typically running this check on a 10-min frequency. Thus, each additional Dgraph Alpha instance would allow Zero to further split the predicates from groups and move them to the new node.

Reference: Cluster Setup - Deploy

system · August 19, 2024, 5:19pm

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Zero auto rebalancing cause mutation rejected Users mutation	1	535	July 1, 2020
Alpha nodes stuck in "opPredMove" Dgraph Cloud / Slash GraphQL	4	537	August 15, 2024
Predicate mutations while moving predicates between groups Dgraph mutation , status:accepted , ticket:created	9	1297	April 30, 2021
Dgraph zero rpc timeout when moving _predicate_ between groups Dgraph	2	567	April 17, 2018
Moving predicates - no downtime Dgraph	7	441	September 17, 2018

Predicate Rebalances (Moves) Across Servers Cause Transactions to Abort

Shard rebalancing

Related topics