Error: A tick missed to fire. Node blocks too long!. How to resolve?

I have two groups, each with three alphas.

Each alpha in group 0 has 1.4T data and each alpha in group 1 has 1.1T data.

After rebalance, one of the alhpa becomes 338GB, and the other alhpa are 1.4T.

This 338GB one has been outputting A tick missed to fire. Node blocks too long! error.

Can I stop this alpha and copy the alpha data of the same group into it?

my dgraph version: v20.07

@Valdanito can you please share alpha and zero logs for further investigation? It would be really helpful in understanding what went wrong.

This happens when we’re producing more ticks than what we can process. All operations in raft are dependent on the ticker.

@Valdanito How long has the cluster been running?
We’ve seen these messages in zero nodes. Can you confirm you’re seeing it in alpha nodes?

Each node has its own state. You shouldn’t copy over the data directory. You could add a new node to the group and the leader will stream the data to the new node (which is much better than copy-pasting the directory since the leader will stream only the valid data and it will clean up your disk too). This operation is generally safe but you should take a backup.

Also see
https://github.com/etcd-io/etcd/issues/9939
https://github.com/dgraph-io/dgraph/issues/2541

Thank you, it will be fine after a while, no useful information is found in the log, but the data is always 338GB :joy:

Thank you very much!

I just hit this error.
Any clue to resolve this?

  • Cloud: AKS
  • dgraph version: v21.12.0
  • cluster config: one zero, 3 alpha
  • Alpha: one NVMe SSD disk 1.6 T; 48 GB RAM
  • Zero: two NVMe SSD disks each 1.6 T, 128 GB RAM

I hit this blocking error after running Live Uploader for about 6 hours.