Predicate Rebalances (Moves) Across Servers Cause Transactions to Abort

When a predicates rebalances (moves) across groups/servers, mutation transactions abort at a significantly higher rate.

Scenario:

  1. A server is sending a continuous live data flow into dgraph with a predicateA, predicateB, and predicateC.
  2. The auto-balance interval is hit, selecting predicateB to move groups.
  3. predicateB begins moving.
  4. Live ingest of predicateA, predicateB, and predicateC begin having transaction abort errors while the predicate is being moved.

In this example, predicateB is 23GBs, so it can take 10+ minutes to move it.

Should the rebalance of predicates be transparent to the rest of the system? Specifically, can a predicate be written to at scale/volume when that predicate is being rebalanced off the group/server?

Thanks,
Ryan

I found the answer my question reading through this post:

Shard rebalancing

Dgraph Zero tries to rebalance the cluster based on the disk usage in each group. If Zero detects an imbalance, it will try to move a predicate along with its indices to a group that has lower disk usage. This can make the predicate temporarily read-only. Queries for the predicate will still be serviced, but any mutations for the predicate will be rejected and should be retried after the move is finished.

Zero would continuously try to keep the amount of data on each server even, typically running this check on a 10-min frequency. Thus, each additional Dgraph Alpha instance would allow Zero to further split the predicates from groups and move them to the new node.

Reference: Cluster Setup - Deploy

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.