The health endpoint responds in a timely manner, but when the all parameter is added, it always times out.
The node with this phenomenon cannot write data properly. Data reading was not tested.
Resolved after reboot.
What I Did
Removed and rejoined a node (not the current issue alpha), the group leader is normal, another follow has this problem
Dgraph version : v20.11.0-rc5
raph codename : tchalla
Dgraph SHA-256 : 95d845ecec057813d1a3fc94394ba1c18ada80f584120a024c19d0db668ca24e
Commit SHA-1 : b65a8b10c
Commit timestamp : 2020-12-14 19:09:28 +0530
Branch : HEAD
Go version : go1.15.5
jemalloc enabled : true
If this is a bug report. Please put it at
/Issues/Dgraph instead of
/Users/Dgraph and also, please follow the template for bugs. That helps to reproduce and assign an engineer rapidly. The more information you provide, the better.
What do you mean “all parameter is used”? I don’t understand this.
/health?all returns information about the health of all the servers in the cluster.
Ah, ok I see what you mean. Could you use the
/graphql endpoint and do a
So there seems to be TWO issues:
- The health endpoint does not represent alpha state correctly unless
?all is used
- The health endpoint times out if
?all is used.
I don’t know what “represent alpha state correctly” means - perhaps you want to see Zero information as well? If so, then you are correct, and this is not a bug. But if there are some facts about the Alphas in the cluster that are wrong unless
?all is used, then it’s a bug.
On #2, this seems to be a bug. Tagging @ibrahim
In fact, the node broke down at that time.
health status code is 200 and the request
health?all keeps waiting until it times out.
I would say the request without the all parameter is not showing the node status correctly.
Hey @zzl221000, this sounds like a bug. Could you please help me reproduce this?
Meanwhile, could you please help me with the following three items?
- Output of
curl -v 'localhost:8080/health?all' . Replace localhost:8080 with your alpha instance’s host:port.
- Output of
curl localhost:8080/debug/pprof/goroutine\?debug\=2 -o goroutine.txt . Please run this command after sending the curl request. The goroutine.txt file will show what’s running when you hit the
/health?all endpoint. Please share the goroutine.txt file
- Output of
curl localhost:8080/debug/pprof/profile -o cpu.pprof. This is a cpu profile which will also help us figure out why it’s taking so long.
Please share the
cpu.pproffiles generated in step 2 and 3.