Hi,
I am facing an issue where dgraph upsert fails with the following error message - “rpc error: code = Unknown desc = No connection exists”. The configuration I am using is -
Dgraph cluster mode (Using the kubernetes deployment file from https://github.com/dgraph-io/dgraph/blob/master/contrib/config/kubernetes/dgraph-ha/dgraph-ha.yaml)
I have also added a 2vcpu/6GB resource limitation to each zero and alpha node -
resources:
requests:
memory: “2048Mi”
cpu: “1000m”
limits:
memory: “6144Mi”
cpu: “2000m”
On checking the logs from dgraph alpha I found that there were frequent error messages like the following -
I0105 18:42:06.672318 1 groups.go:931] Zero leadership changed. Renewing oracle delta stream.
E0105 18:42:06.672462 1 groups.go:907] Error in oracle delta stream. Error: rpc error: code = Canceled desc = context canceled
I0105 18:42:07.671665 1 groups.go:863] Leader idx=0x1 of group=1 is connecting to Zero for txn updates
I0105 18:42:08.486535 1 groups.go:875] Got Zero leader: dgraph-zero-0.dgraph-zero.xyz.svc.case.local:5080
I noticed this error starts appearing frequently after I upsert ~1M nodes to dgraph.
I have tried with using just one dgraph zero node and 3 dgraph alpha nodes and the problem persists.
As a workaround i have added a dgraph keepalive ping and retrying when I get the error message. But, I would love to get a RCA for the issue and know if I can do anything from my end to fix this issue.