Thank you, Kevin!
Our self-managed production Dgraph cluster is running on six on-premises servers:
Our current set up is a 9-node cluster: 3 Zeroes, 6 Alphas (two groups). Three machines run both a single Zero and a single Alpha, where as the other three run just a single Alpha. We have the latest version (v24.0.0
) installed and running using Docker Swarm mode.
Our dataset is growing on the daily (although not as fast as we’d like it to with the amount of exceptions caused); but in our present p
directories, there’s roughly 75GB in the Alpha Group #1 directories and 100GB in Alpha Group #2 directories.
We’re seeing the Unhealthy connection
error in several different places. We see it most often (and very consistently) from our results service API, which uses dgraph4j to query Dgraph from our application. We’ve occasionally seen it from a separate client in our custom NiFi processor used to ingest the data, which also uses dgraph4j under the hood. However, as @rahst12 mentioned above, we also see this when cURLing directly using no client at all:
$ curl -H "Content-Type: application/dql" -X POST http://x.x.x.x:8080/query --data-binary '@query.json'
I think that debugging over the phone next week could really help us out; but let us know if we can help with any more background information in the meantime!