Mutation failed because Dgraph execution: Unhealthy connection

jordan · December 23, 2020, 2:05pm

Hello

I have 10 scripts that periodically insert or read data in Dgraph. The data are from the same node, for example, gitcommit. The scripts may run at the same time.

Unfortunately, during the execution I get the following error:

{‘message’: ‘mutation failed because Dgraph execution failed because : dispatchTaskOverNetwork: while retrieving connection.: Unhealthy connection’, ‘locations’: [{‘line’: 2, ‘column’: 3}]}

Dgraph Alpha and Zero are running in Kubernetes. I have 5 Zeros and 6 Alphas. I have checked the logs, but the errors or warnings I saw were:

From Dgraph Alpha

E1223 11:27:32.586522 20 groups.go:1000] No longer the leader of group 1. Exiting
E1223 11:27:32.586599 20 groups.go:937] Error in oracle delta stream. Error: rpc error: code = Canceled desc = context canceled

W1223 11:22:03.601350 20 draft.go:1313] Raft.Ready took too long to process: Timer Total: 549ms. Breakdown: [{disk 323ms} {proposals 0s} {advance 0s}] Num entries: 0. MustSync: false

From Dgraph Zero

W1221 14:29:42.811258 19 pool.go:204] Shutting down extra connection to dgraph-alpha-0.dgraph-alpha.dgraph-2011.svc.cluster.local:7080

W1221 17:15:58.941971 21 raft.go:922] Raft.Ready took too long to process: Timer Total: 838ms. Breakdown: [{proposals 838ms} {disk 0s} {advance 0s}]. Num entries: 1. Num committed entries: 0. MustSync: true
W1221 17:38:27.684235 21 raft.go:922] Raft.Ready took too long to process: Timer Total: 2.476s. Breakdown: [{disk 2.476s} {proposals 0s} {advance 0s}]. Num entries: 0. Num committed entries: 1. MustSync: false

The error is occurring in a script that tries to insert a list with 1000 elements, but the list size in memory is 9032 bytes. Besides, I can insert many list before this error occur. I don’t know if this error is because of data payload.

I also don’t know if this error is caused by Shard rebalancing at the same time data is being inserted on Dgraph.

Any help would be appreciated

aman-bansal · December 25, 2020, 2:41pm

Hi @jordan, can you confirm which version of Dgraph are you using? In earlier version we have experienced this because of slowness in badger in terms of managing Raft WAL. In the recent release with v20.11, we have upgraded this behavior.

This happens when the leadership is changing and hence the connections with non leader node is getting terminated and new connection with leader will be created.

This error occurs when the node is unhealthy. The parameter for a healthy is that the last ping epoch time should be within 2 sec.

Please confirm the Dgraph version and if possible the behaviour of script run so that we can investigate further .

mrjn · December 25, 2020, 3:14pm

This seems like it could be due to a slow disk.

jordan · December 28, 2020, 11:42am

v20.11.0

I’m checking the script again. I’m using gql with python to read and insert data into Dgraph. Maybe I spend too much time in processing messages (longer than 2 seconds) before sending them?

Topic		Replies	Views
Dgraph runs into a error loop and freezes the host Users	20	2219	February 21, 2018
Become unable to execute mutation suddenly in write-heavy workload Users	6	1076	July 22, 2018
Alpha stuck at Raft.Ready took too long to process Dgraph dgraph , status:accepted , kind:bug	0	596	July 12, 2020
Live Loder: Error while mutating While proposing error: raft proposal dropped Dgraph mutation	3	910	November 3, 2020
Kubenatas Cluster Zero sometime will be Unhealthy connection Users example	8	1605	June 5, 2019

Mutation failed because Dgraph execution: Unhealthy connection

Related topics