Alpha nodes stuck in "opPredMove"

aayush · March 16, 2023, 12:27pm

Hi there,
So we have a situation where our the health check on the dgraph cluster (running on GKE) gives:

"ongoing": [
      "opPredMove"
    ]

I’m wondering What does it mean exaclty and how long it will stay in this state? Also, would this effect the dgraph operations ? As we have been facing several errors while writing data to dgraph.

Since we are facing issues in production, any suggestion to fix this asap would be really helpful. thanks!

MichelDiz · March 16, 2023, 5:08pm

The cluster is moving the predicate tablet. You should check the health of your disk and the size of it. Moves happens when the disk is full or slow.

zooney · March 16, 2023, 7:53pm

In the documentation it says that:

Dgraph Zero tries to rebalance the cluster based on the disk usage in each group. If Zero detects an imbalance, it will try to move a predicate along with its indices to a group that has lower disk usage. This can make the predicate temporarily read-only. Queries for the predicate will still be serviced, but any mutations for the predicate will be rejected and should be retried after the move is finished.

Zero would continuously try to keep the amount of data on each server even, typically running this check on a 10-min frequency. Thus, each additional Dgraph Alpha instance would allow Zero to further split the predicates from groups and move them to the new node.

Is there a way to set this to a different frequency or to stop predicates moving all together and make it a manual process?

MichelDiz · March 16, 2023, 9:26pm

You can’t.

You can only make the interval longer. If put a huge time it will never move any predicate.

# dgraph zero -h | grep rebalance_interval
      --rebalance_interval duration   Interval for trying a predicate move. (default 8m0s)

And to move it manually, you can use the HTTP process or Ratel. But you gonna move predicate by predicate. There’s no bulk moving.

rahst12 · August 15, 2024, 11:08pm

This is incredibly important to know.

Topic		Replies	Views
Predicate Rebalances (Moves) Across Servers Cause Transactions to Abort Dgraph	2	37	August 15, 2024
Predicate mutations while moving predicates between groups Dgraph mutation , status:accepted , ticket:created	9	1296	April 30, 2021
Dgraph zero rpc timeout when moving _predicate_ between groups Dgraph	2	567	April 17, 2018
Zero auto rebalancing cause mutation rejected Users mutation	1	534	July 1, 2020
Enabling replica on an existing dgraph cluster Dgraph	4	389	October 7, 2021

Alpha nodes stuck in "opPredMove"

Related topics