Error while writing: write tcp 172.18.0.2:8080->172.27.65.10:58088: write: broken pipe

Error while writing: write tcp 172.18.0.2:8080->172.27.65.10:58088: write: broken pipe

Error while writing: write tcp 172.18.0.2:8080->172.27.65.10:51716: write: connection timed out

Has anyone ever encountered this problem? The installation method is docker

Yes, I do encounter this error. No Idea what is causing it but it leads to up to 1min downtime for our whole production infrastucture.

Anyone help please?

Hi @FaxBoy @maaft
Have you solved it please? I also have this problem now.

Do you run deep Lambda queries? In other words, do your queries trigger Lambda field resolvers which then query other Lambda fields using GraphQL?

Even we are facing a similar issue when a user or an application runs a huge query. Leader alpha service is getting restarted very frequently. When we checked our error log we found the below error messages before the service failure

Dec 13 11:43:30 dgraph: E1213 11:43:30.390352 x.go:354] Error while writing: write tcp 192.168.x.x:xxxx->192.168.x.x:xx965: write: broken pipe

Can anyone help to fix this issue?

I’m going to be a little vague, due to lack of context. I don’t know what you’re using. Local DGraph(Docker? K8s? local binary?)? in the cloud? Using lambdas(how?)? Is that our py client? What are you doing? Context always helps.

It looks like you are encountering a network error while trying to write data to a TCP connection. This error can be caused by a variety of factors, including network congestion, a faulty network connection, or an issue with the destination server running Dgraph. Maybe a container failing?

In general, a “broken pipe” error indicates that data could not be sent or received over the network connection.

One potential solution to this problem is to try increasing the connection timeout value. This will give the server more time to establish a connection before timing out. Additionally, you could try using a different network connection.

If the problem persists, it may be helpful to gather more information about the specific circumstances under which the error is occurring. This could include checking the network logs and monitoring the performance of your server(bare metal or not) to see if there are any patterns or trends that could help identify the root cause of the issue.

If you are still unable to resolve the issue, it may be helpful to consult with a network or system administrator who has experience troubleshooting similar problems. They will be able to provide more specific advice based on the specific details of your situation.

If you are facing a similar issue where the leader alpha service is frequently restarting when a user or application runs a large query, it is possible that you have insufficient resources available to handle the query. For example, if the query is too large or has too many indexes, it may exceed the available resources and cause the cluster to fail. In this situation, the best solution is to shut down the cluster and restart it to recover from the failure. It is also important to ensure that your cluster is properly sized and configured to handle the workload and avoid future failures.