Dgraph HA Cluster suddenly becomes unavailable


Report a Dgraph Bug

I run 3 nodes HA cluster on Kubernetes and it sometimes becomes unavailable for no reason. Only restarting dgraph-alpha helps. I found these errors in logs, maybe they are the root cause:

Error in oracle delta stream. Error: rpc error: code = Canceled desc = context canceled
Error in oracle delta stream. Error: rpc error: code = Unknown desc = Node is no longer leader

Does anyone know what could be the cause?

What version of Dgraph are you using?

Dgraph Version
v21.03.0

Have you tried reproducing the issue with the latest release?

No latest release Zion is not stable

What is the hardware spec (RAM, OS)?

Kubernetes, HA Cluster

Steps to reproduce the issue (command/config used to run Dgraph).

I don’t know yet

This screenshots are from the official dgraph grafana dashboard