My api service was working a minute ago, then it’s not. I’ve got 3 node k8 cluster up with x3 replication, How do i diagnose this issue ? is this what i should expect on a regular basis ?
i’ve got this working again by restarting the pods, but i need an explanation to why this happens and if i would need to expect this often.
update: its happening again, restarting pods can’t be a viable way to fix this.
Please can i get some help concerning this issue ? thanks in advance
The logs should say what’s happening in the cluster when you’re seeing this message. If you can share cluster logs (all Zeros and Alphas) that would help.
Not ready to accept requests can mean there’s an issue with the instances being able to communicate with each other. Typically, the instance hasn’t been able to establish a quorum.
Hmm okay, sorry how do i check the logs again ?
Logs are written to stdout/stderr. On Kubernetes, you can use kubectl logs <podname>
to get the logs for that pod.
16 node.go:182] Setting conf state to nodes:1
I0414 13:23:58.416031 16 node.go:182] Setting conf state to nodes:1 nodes:2
I0414 13:24:00.903309 16 log.go:34] 1 is starting a new election at term 3
I0414 13:24:00.903362 16 log.go:34] 1 became pre-candidate at term 3
I0414 13:24:00.903371 16 log.go:34] 1 received MsgPreVoteResp from 1 at term 3
I0414 13:24:00.903982 16 log.go:34] 1 [logterm: 3, index: 421] sent MsgPreVote request to 2 at term 3
I0414 13:24:03.155852 16 log.go:34] 1 became follower at term 3
I0414 13:24:03.156289 16 log.go:34] raft.node: 1 elected leader 2 at term 3
I0414 13:24:03.295733 16 admin.go:513] No GraphQL schema in Dgraph; serving empty GraphQL API
I0414 13:26:21.419924 16 log.go:34] 1 [term 3] received MsgTimeoutNow from 2 and starts an election to get leadership.
I0414 13:26:21.420430 16 log.go:34] 1 became candidate at term 4
I0414 13:26:21.420524 16 log.go:34] 1 received MsgVoteResp from 1 at term 4
I0414 13:26:21.420755 16 log.go:34] 1 [logterm: 3, index: 422] sent MsgVote request to 2 at term 4
I0414 13:26:21.421345 16 log.go:34] raft.node: 1 lost leader 2 at term 4
I0414 13:26:21.446151 16 log.go:34] 1 received MsgVoteResp from 2 at term 4
I0414 13:26:21.446277 16 log.go:34] 1 [quorum:2] has received 2 MsgVoteResp votes and 0 vote rejections
I0414 13:26:21.446368 16 log.go:34] 1 became leader at term 4
I0414 13:26:21.446524 16 log.go:34] raft.node: 1 elected leader 1 at term 4
I0414 13:26:22.401743 16 groups.go:856] Leader idx=0x1 of group=1 is connecting to Zero for txn updates
I0414 13:26:22.401870 16 groups.go:865] Got Zero leader: dgraph-zero-0.dgraph-zero.default.svc.cluster.local:5080
E0414 13:26:22.419559 16 groups.go:1093] Error from worker subscribe stream: rpc error: code = Unavailable desc = transport is closing
I0414 13:26:22.424619 16 pool.go:160] CONNECTING to dgraph-alpha-0.dgraph-alpha.default.svc.cluster.local:7080
W0414 13:26:22.421977 16 pool.go:254] Connection lost with dgraph-alpha-1.dgraph-alpha.default.svc.cluster.local:7080. Error: rpc error: code = Unavailable desc = transport is closing
W0414 13:26:22.503802 16 node.go:417] Unable to send message to peer: 0x2. Error: EOF
W0414 13:26:23.605793 16 node.go:417] Unable to send message to peer: 0x2. Error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: lookup dgraph-alpha-1.dgraph-alpha.default.svc.cluster.local: no such host"
W0414 13:26:33.703619 16 node.go:417] Unable to send message to peer: 0x2. Error: Unhealthy connection
W0414 13:26:43.804177 16 node.go:417] Unable to send message to peer: 0x2. Error: Unhealthy connection
W0414 13:26:53.903742 16 node.go:417] Unable to send message to peer: 0x2. Error: Unhealthy connection
W0414 13:27:03.904037 16 node.go:417] Unable to send message to peer: 0x2. Error: Unhealthy connection
I0414 13:27:10.708056 16 log.go:34] 1 [logterm: 4, index: 424, vote: 1] cast MsgPreVote for 2 [logterm: 4, index: 4
Could this issue be because of cloud provider ?
From the error above, seems like the alpha node can’t find the other alpha node. Can you share logs from the other alpha node as well? In general, the more logs that we can see i.e. alpha, zero nodes, the easier it would be to diagnose what might be happening.
dgraph-alpha-1:
I0416 07:57:14.781286 15 node.go:182] Setting conf state to nodes:1
I0416 07:57:14.781953 15 node.go:182] Setting conf state to nodes:1 nodes:2
I0416 07:57:17.963257 15 log.go:34] 2 is starting a new election at term 6
I0416 07:57:17.963289 15 log.go:34] 2 became pre-candidate at term 6
I0416 07:57:17.963297 15 log.go:34] 2 received MsgPreVoteResp from 2 at term 6
I0416 07:57:17.964113 15 log.go:34] 2 [logterm: 6, index: 827] sent MsgPreVote request to 1 at term 6
W0416 07:57:18.965691 15 node.go:417] Unable to send message to peer: 0x1. Error: Do not have address of peer 0x1 I0416 07:57:19.624935 15 admin.go:513] No GraphQL schema in Dgraph; serving empty GraphQL API
I0416 07:57:20.680385 15 log.go:34] 2 became follower at term 6
I0416 07:57:20.681087 15 log.go:34] raft.node: 2 elected leader 1 at term 6
dgraph-alpha-2
I0414 10:48:23.109031 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:48:28.110375 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:48:28.110492 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:48:33.111766 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:48:33.111824 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:48:38.112960 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:48:38.113018 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:48:43.113783 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:48:43.113822 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:48:48.115369 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:48:48.115466 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:48:53.122708 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:48:53.122753 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:48:58.135103 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:48:58.135410 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:49:03.136500 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:49:03.136551 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:49:08.137264 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:49:08.137310 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:49:13.149530 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:49:13.149564 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:49:18.150639 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:49:18.150680 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:49:23.151035 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:49:23.151074 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:49:28.151437 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:49:28.151707 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:49:33.152998 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:49:33.153043 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:49:38.154053 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:49:38.154099 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:49:43.155366 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:49:43.155414 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:49:48.175450 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:49:48.175634 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:49:53.185610 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:49:53.185675 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:49:58.199523 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:49:58.199557 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:50:03.206144 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:50:03.206189 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:50:08.206560 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:50:08.206603 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:50:13.207620 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:50:13.207671 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:50:18.208201 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:50:18.208241 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:50:23.209541 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:50:23.209591 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:50:28.210458 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:50:28.210501 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:50:33.211682 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:50:33.211741 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:50:38.213191 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:50:38.213245 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:50:43.213969 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:50:43.214144 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:50:48.214972 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:50:48.215146 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:50:53.216062 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:50:53.216403 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:50:58.216847 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:50:58.216891 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:51:03.218236 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:51:03.218274 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:51:08.219384 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:51:08.219419 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:51:13.220371 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:51:13.220745 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:51:18.221698 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:51:18.221751 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:51:23.223014 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:51:23.223054 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:51:28.223795 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:51:28.223841 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:51:33.225050 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:51:33.225088 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:51:38.225793 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:51:38.225835 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:51:43.226784 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:51:43.226830 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:51:48.227708 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:51:48.227894 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:51:53.228162 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:51:53.228197 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:51:58.229255 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:51:58.229311 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:52:03.229643 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:52:03.229688 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:52:08.230732 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:52:08.230780 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:52:13.231898 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:52:13.231947 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:52:18.232406 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:52:18.232493 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:52:23.233338 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:52:23.233411 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:52:28.234162 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:52:28.234205 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:52:33.238082 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:52:33.238115 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:52:38.239516 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:52:38.239557 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:52:43.240313 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:52:43.240352 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:52:48.241183 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:52:48.241238 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:52:53.241620 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:52:53.241658 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:52:58.242938 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:52:58.242995 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:53:03.244164 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:53:03.244213 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:53:08.245165 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:53:08.245212 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:53:13.246063 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:53:13.246105 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:53:18.247444 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:53:18.247531 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:53:23.247917 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:53:23.247983 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:53:28.248510 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:53:28.248562 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:53:33.249977 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:53:33.250103 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:53:38.251511 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:53:38.251545 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:53:43.253466 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:53:43.253514 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:53:48.254395 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:53:48.254452 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:53:53.255708 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:53:53.255740 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:53:58.256412 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:53:58.256459 25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:54:03.257473 25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
dgraph-zero-0
I0413 14:14:39.068546 18 run.go:105] Setting up grpc listener at: 0.0.0.0:5080
I0413 14:14:39.069093 18 run.go:105] Setting up http listener at: 0.0.0.0:6080
badger 2020/04/13 14:14:39 INFO: All 0 tables opened in 0s
I0413 14:14:39.188622 18 node.go:145] Setting raft.Config to: &{ID:1 peers:[] learners:[] ElectionTick:20 Heartbeat Tick:1 Storage:0xc0004c2330 Applied:0 MaxSizePerMsg:262144 MaxCommittedSizePerReady:67108864 MaxUncommittedEntriesSize:0 MaxInflightMsgs:256 CheckQuorum:false PreVote:true ReadOnlyOption:0 Logger:0x260a270 DisableProposalForwarding:false} I0413 14:14:39.190719 18 node.go:323] Group 0 found 1 entries
I0413 14:14:39.191545 18 log.go:34] 1 became follower at term 0
I0413 14:14:39.192196 18 log.go:34] newRaft 1 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0 ]
I0413 14:14:39.192299 18 log.go:34] 1 became follower at term 1
I0413 14:14:39.193028 18 run.go:296] Running Dgraph Zero...
E0413 14:14:39.193395 18 raft.go:516] While proposing CID: Not Zero leader. Aborting proposal: cid:"224d54b1-0dde-4 68e-897c-e83f610ea2b5" . Retrying...
I0413 14:14:39.201673 18 node.go:182] Setting conf state to nodes:1
I0413 14:14:39.201876 18 raft.go:702] Done applying conf change at 0x1
I0413 14:14:40.193873 18 log.go:34] 1 no leader at term 1; dropping index reading msg
W0413 14:14:42.193869 18 node.go:671] [0x1] Read index context timed out
I0413 14:14:42.194024 18 log.go:34] 1 no leader at term 1; dropping index reading msg
E0413 14:14:42.195303 18 raft.go:516] While proposing CID: Not Zero leader. Aborting proposal: cid:"dd005d83-a536-4 ee8-bc70-5d410b74c484" . Retrying...
I0413 14:14:42.893187 18 log.go:34] 1 is starting a new election at term 1
I0413 14:14:42.893382 18 log.go:34] 1 became pre-candidate at term 1
I0413 14:14:42.893429 18 log.go:34] 1 received MsgPreVoteResp from 1 at term 1
I0413 14:14:42.893587 18 log.go:34] 1 became candidate at term 2
I0413 14:14:42.893635 18 log.go:34] 1 received MsgVoteResp from 1 at term 2
I0413 14:14:42.893839 18 log.go:34] 1 became leader at term 2
I0413 14:14:42.894064 18 log.go:34] raft.node: 1 elected leader 1 at term 2
I0413 14:14:42.894225 18 raft.go:667] I've become the leader, updating leases.
I0413 14:14:42.894300 18 assign.go:42] Updated Lease id: 1. Txn Ts: 1
W0413 14:14:44.194156 18 node.go:671] [0x1] Read index context timed out
I0413 14:14:45.260105 18 raft.go:509] CID set for cluster: 4f12b9fb-f876-4f6e-a286-6dd0fcf1f453
I0413 14:14:45.269652 18 license_ee.go:45] Enterprise state proposed to the cluster: key:"z1-8987910882978745324" l icense:<maxNodes:18446744073709551615 expiryTs:1589379285 >
I0413 14:14:53.031052 18 pool.go:160] CONNECTING to dgraph-zero-1.dgraph-zero.default.svc.cluster.local:5080 I0413 14:14:53.031227 18 node.go:583] Trying to add 0x2 to cluster. Addr: dgraph-zero-1.dgraph-zero.default.svc.clu ster.local:5080
I0413 14:14:53.031236 18 node.go:584] Current confstate at 0x1: nodes:1
W0413 14:14:53.045038 18 pool.go:254] Connection lost with dgraph-zero-1.dgraph-zero.default.svc.cluster.local:5080 . Error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: lookup dgraph-zero-1.dgraph-zero.default.svc.cluster.local: no s uch host"
I0413 14:14:53.083970 18 node.go:182] Setting conf state to nodes:1 nodes:2
I0413 14:14:53.084073 18 raft.go:702] Done applying conf change at 0x1
I0413 14:14:53.084121 18 node.go:746] [0x2] Done joining cluster with err: <nil>
W0413 14:14:54.094175 18 node.go:417] Unable to send message to peer: 0x2. Error: rpc error: code = Unavailable des c = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dial ing dial tcp: lookup dgraph-zero-1.dgraph-zero.default.svc.cluster.local: no such host"
W0413 14:14:55.193806 18 node.go:671] [0x1] Read index context timed out
W0413 14:14:57.194036 18 node.go:671] [0x1] Read index context timed out
W0413 14:14:59.194137 18 node.go:671] [0x1] Read index context timed out
I0413 20:42:29.849597 18 zero.go:417] Got connection request: cluster_info_only:true
I0413 20:42:29.854773 18 zero.go:435] Connected: cluster_info_only:true
I0413 20:42:29.858048 18 zero.go:417] Got connection request: group_id:1 addr:"dgraph-alpha-0.dgraph-alpha.default. svc.cluster.local:7080" force_group_id:true
I0413 20:42:29.862610 18 pool.go:160] CONNECTING to dgraph-alpha-0.dgraph-alpha.default.svc.cluster.local:7080 W0413 20:42:29.868510 18 pool.go:254] Connection lost with dgraph-alpha-0.dgraph-alpha.default.svc.cluster.local:70 80. Error: rpc error: code = Unknown desc = No node has been set up yet
I0413 20:42:29.907751 18 zero.go:562] Connected: id:1 group_id:1 addr:"dgraph-alpha-0.dgraph-alpha.default.svc.clus ter.local:7080" force_group_id:true
I0413 20:43:46.966780 18 zero.go:417] Got connection request: cluster_info_only:true
I0413 20:43:46.974174 18 zero.go:435] Connected: cluster_info_only:true
I0413 20:43:46.985951 18 zero.go:417] Got connection request: group_id:1 addr:"dgraph-alpha-1.dgraph-alpha.default. svc.cluster.local:7080" force_group_id:true
I0413 20:43:46.991445 18 pool.go:160] CONNECTING to dgraph-alpha-1.dgraph-alpha.default.svc.cluster.local:7080 W0413 20:43:47.026844 18 pool.go:254] Connection lost with dgraph-alpha-1.dgraph-alpha.default.svc.cluster.local:70 80. Error: rpc error: code = Unknown desc = No node has been set up yet
I0413 20:43:47.029240 18 zero.go:562] Connected: id:2 group_id:1 addr:"dgraph-alpha-1.dgraph-alpha.default.svc.clus ter.local:7080" force_group_id:true
I0414 11:51:22.565870 18 zero.go:417] Got connection request: cluster_info_only:true
I0414 11:51:22.573680 18 zero.go:435] Connected: cluster_info_only:true
I0414 11:51:22.577224 18 zero.go:417] Got connection request: id:1 addr:"dgraph-alpha-0.dgraph-alpha.default.svc.cl uster.local:7080"
I0414 11:51:22.580127 18 zero.go:544] Connected: id:1 addr:"dgraph-alpha-0.dgraph-alpha.default.svc.cluster.local:7 080"
I0414 13:23:58.384835 18 zero.go:417] Got connection request: cluster_info_only:true
I0414 13:23:58.388449 18 zero.go:435] Connected: cluster_info_only:true
I0414 13:23:58.390755 18 zero.go:417] Got connection request: id:1 addr:"dgraph-alpha-0.dgraph-alpha.default.svc.cl uster.local:7080"
I0414 13:23:58.393197 18 zero.go:544] Connected: id:1 addr:"dgraph-alpha-0.dgraph-alpha.default.svc.cluster.local:7080"
I0414 13:27:04.789146 18 zero.go:417] Got connection request: cluster_info_only:true
I0414 13:27:04.791805 18 zero.go:435] Connected: cluster_info_only:true
I0414 13:27:04.795815 18 zero.go:417] Got connection request: id:2 addr:"dgraph-alpha-1.dgraph-alpha.default.svc.cluster.local:7080" I0414 13:27:04.798797 18 zero.go:544] Connected: id:2 addr:"dgraph-alpha-1.dgraph-alpha.default.svc.cluster.local:7080"
I0416 07:55:01.174974 18 zero.go:417] Got connection request: cluster_info_only:true
I0416 07:55:01.178270 18 zero.go:435] Connected: cluster_info_only:true
I0416 07:55:01.181613 18 zero.go:417] Got connection request: id:2 addr:"dgraph-alpha-1.dgraph-alpha.default.svc.cluster.local:7080" I0416 07:55:01.184757 18 zero.go:544] Connected: id:2 addr:"dgraph-alpha-1.dgraph-alpha.default.svc.cluster.local:7080"
I0416 07:56:13.076416 18 zero.go:417] Got connection request: cluster_info_only:true
I0416 07:56:13.079198 18 zero.go:435] Connected: cluster_info_only:true
I0416 07:56:13.082974 18 zero.go:417] Got connection request: id:1 addr:"dgraph-alpha-0.dgraph-alpha.default.svc.cluster.local:7080" I0416 07:56:13.086004 18 zero.go:544] Connected: id:1 addr:"dgraph-alpha-0.dgraph-alpha.default.svc.cluster.local:7080"
I0416 07:57:20.461974 18 zero.go:417] Got connection request: cluster_info_only:true
I0416 07:57:20.466119 18 zero.go:435] Connected: cluster_info_only:true
I0416 07:57:20.470970 18 zero.go:417] Got connection request: id:2 addr:"dgraph-alpha-1.dgraph-alpha.default.svc.cluster.local:7080" I0416 07:57:20.474443 18 zero.go:544] Connected: id:2 addr:"dgraph-alpha-1.dgraph-alpha.default.svc.cluster.local:7080"
dgraph-zero-1
I0413 14:14:49.103142 18 run.go:105] Setting up grpc listener at: 0.0.0.0:5080
I0413 14:14:49.103606 18 run.go:105] Setting up http listener at: 0.0.0.0:6080
badger 2020/04/13 14:14:49 INFO: All 0 tables opened in 0s
I0413 14:14:49.167000 18 node.go:145] Setting raft.Config to: &{ID:2 peers:[] learners:[] ElectionTick:20 HeartbeatTick:1 Storage:0xc0006ce2a0 Applied:0 MaxSizePerMsg:262144 MaxCommittedSizePerReady:67108864 MaxUncommittedEntriesSize:0 MaxInflightMsgs:256 CheckQuorum:false PreVote:true ReadOnlyOption:0 Logger:0x260a270 DisableProposalForwarding:false}
I0413 14:14:49.167923 18 node.go:323] Group 0 found 1 entries
I0413 14:14:49.168721 18 pool.go:160] CONNECTING to dgraph-zero-0.dgraph-zero.default.svc.cluster.local:5080
I0413 14:14:49.235316 18 raft.go:494] [0x2] Starting node
I0413 14:14:49.236532 18 log.go:34] 2 became follower at term 0
I0413 14:14:49.236584 18 log.go:34] newRaft 2 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
I0413 14:14:49.236609 18 log.go:34] 2 became follower at term 1
I0413 14:14:49.238097 18 run.go:296] Running Dgraph Zero...
I0413 14:14:50.237320 18 log.go:34] 2 no leader at term 1; dropping index reading msg
W0413 14:14:52.237539 18 node.go:671] [0x2] Read index context timed out
I0413 14:14:52.238365 18 log.go:34] 2 no leader at term 1; dropping index reading msg
W0413 14:14:54.238761 18 node.go:671] [0x2] Read index context timed out
I0413 14:14:54.238844 18 log.go:34] 2 no leader at term 1; dropping index reading msg
W0413 14:14:56.239043 18 node.go:671] [0x2] Read index context timed out
I0413 14:14:56.239119 18 log.go:34] 2 no leader at term 1; dropping index reading msg
I0413 14:14:56.253538 18 log.go:34] 2 [term: 1] received a MsgHeartbeat message with higher term from 1 [term: 2]
I0413 14:14:56.253584 18 log.go:34] 2 became follower at term 2
I0413 14:14:56.253604 18 log.go:34] raft.node: 2 elected leader 1 at term 2
I0413 14:14:57.347882 18 node.go:182] Setting conf state to nodes:1
I0413 14:14:57.347994 18 raft.go:702] Done applying conf change at 0x2
I0413 14:14:57.348045 18 node.go:182] Setting conf state to nodes:1 nodes:2
I0413 14:14:57.348069 18 raft.go:702] Done applying conf change at 0x2
W0413 14:14:58.239310 18 node.go:671] [0x2] Read index context timed out
I0413 20:42:25.874801 18 pool.go:160] CONNECTING to dgraph-alpha-0.dgraph-alpha.default.svc.cluster.local:7080
I0413 20:43:43.014088 18 pool.go:160] CONNECTING to dgraph-alpha-1.dgraph-alpha.default.svc.cluster.local:7080
W0414 11:49:50.697042 18 pool.go:254] Connection lost with dgraph-alpha-0.dgraph-alpha.default.svc.cluster.local:7080. Error: rpc error: code = Unavailable desc = transport is closing
W0414 13:26:17.900615 18 pool.go:254] Connection lost with dgraph-alpha-1.dgraph-alpha.default.svc.cluster.local:7080. Error: rpc error: code = Unavailable desc = transport is closing
dgraph-zero-2
E0416 05:08:04.776225 18 raft.go:486] Error while joining cluster: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: i/o timeout"
E0416 05:08:46.700031 18 raft.go:486] Error while joining cluster: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: i/o timeout"
E0416 05:09:28.649858 18 raft.go:486] Error while joining cluster: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: i/o timeout"
E0416 05:10:10.594666 18 raft.go:486] Error while joining cluster: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: i/o timeout"
E0416 05:10:52.285633 18 raft.go:486] Error while joining cluster: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: i/o timeout"
E0416 05:11:34.399184 18 raft.go:486] Error while joining cluster: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: i/o timeout"
E0416 05:12:16.529782 18 raft.go:486] Error while joining cluster: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: i/o timeout"
E0416 05:12:58.285969 18 raft.go:486] Error while joining cluster: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: i/o timeout"
E0416 05:13:40.387603 18 raft.go:486] Error while joining cluster: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: i/o timeout"
E0416 05:14:22.531640 18 raft.go:486] Error while joining cluster: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: i/o timeout"
E0416 05:15:04.311339 18 raft.go:486] Error while joining cluster: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: i/o timeout"
E0416 05:15:46.401998 18 raft.go:486] Error while joining cluster: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: i/o timeout"
E0416 05:16:28.537464 18 raft.go:486] Error while joining cluster: rpc error: code = Unavailable desc = all SubConns are in TransientFail^C
Ideally, we’d like to see logs from when the server started/restarted as that would help us see if the connection was setup correctly in the first place and when did it break. That being said, I can see some unexpected errors like i/o timeout
above.
We would need more information about your Kubernetes config and a way to reproduce this on our end. We have users running Dgraph on k8s so it most probably looks like a config issue at your end. Tagging @slotlocker2 who should be able to help you more here.
It would also be worthwhile to look at the events from the namespace of the Dgraph pods - you could retrieve them using kubectl get events --sort-by=.metadata.creationTimestamp
( I believe on a typical K8S cluster, the events have a TTL of an hour, so events from around the time of failure would be helpful).
This aside from any insights into how the cluster was setup, and config used to spin up the Dgraph pods might get a better understanding of what’s happening on the cluster. @scroobius-pip
Here’s my kubernetes config:
# This highly available config creates 3 Dgraph Zeros, 3 Dgraph
# Alphas with 3 replicas, and 1 Ratel UI client. The Dgraph cluster
# will still be available to service requests even when one Zero
# and/or one Alpha are down.
#
# There are 4 public services exposed, users can use:
# dgraph-zero-public - To load data using Live & Bulk Loaders
# dgraph-alpha-public - To connect clients and for HTTP APIs
# dgraph-ratel-public - For Dgraph UI
# dgraph-alpha-x-http-public - Use for debugging & profiling
# apiVersion: v1
# kind: Service
# metadata:
# name: dgraph-zero-public
# labels:
# app: dgraph-zero
# spec:
# type: LoadBalancer
# ports:
# - port: 5080
# targetPort: 5080
# name: zero-grpc
# - port: 6080
# targetPort: 6080
# name: zero-http
# selector:
# app: dgraph-zero
# ---
apiVersion: v1
kind: Service
metadata:
name: dgraph-alpha-public
labels:
app: dgraph-alpha
spec:
type: LoadBalancer
ports:
- port: 8080
targetPort: 8080
name: alpha-http
- port: 9080
targetPort: 9080
name: alpha-grpc
selector:
app: dgraph-alpha
---
# This service is created in-order to debug & profile a specific alpha.
# You can create one for each alpha that you need to profile.
# For a more general HTTP APIs use the above service instead.
# apiVersion: v1
# kind: Service
# metadata:
# name: dgraph-alpha-0-http-public
# labels:
# app: dgraph-alpha
# spec:
# type: LoadBalancer
# ports:
# - port: 8080
# targetPort: 8080
# name: alpha-http
# selector:
# statefulset.kubernetes.io/pod-name: dgraph-alpha-0
# ---
# apiVersion: v1
# kind: Service
# metadata:
# name: dgraph-ratel-public
# labels:
# app: dgraph-ratel
# spec:
# type: LoadBalancer
# ports:
# - port: 8000
# targetPort: 8000
# name: ratel-http
# selector:
# app: dgraph-ratel
# ---
# This is a headless service which is necessary for discovery for a dgraph-zero StatefulSet.
# https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#creating-a-statefulset
apiVersion: v1
kind: Service
metadata:
name: dgraph-zero
labels:
app: dgraph-zero
spec:
ports:
- port: 5080
targetPort: 5080
name: zero-grpc
clusterIP: None
selector:
app: dgraph-zero
---
# This is a headless service which is necessary for discovery for a dgraph-alpha StatefulSet.
# https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#creating-a-statefulset
apiVersion: v1
kind: Service
metadata:
name: dgraph-alpha
labels:
app: dgraph-alpha
spec:
ports:
- port: 7080
targetPort: 7080
name: alpha-grpc-int
clusterIP: None
selector:
app: dgraph-alpha
---
# This StatefulSet runs 3 Dgraph Zero.
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: dgraph-zero
spec:
serviceName: "dgraph-zero"
replicas: 3
selector:
matchLabels:
app: dgraph-zero
template:
metadata:
labels:
app: dgraph-zero
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- dgraph-zero
topologyKey: kubernetes.io/hostname
containers:
- name: zero
image: dgraph/dgraph:latest
imagePullPolicy: IfNotPresent
ports:
- containerPort: 5080
name: zero-grpc
- containerPort: 6080
name: zero-http
volumeMounts:
- name: datadir
mountPath: /dgraph
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
command:
- bash
- "-c"
- |
set -ex
[[ `hostname` =~ -([0-9]+)$ ]] || exit 1
ordinal=${BASH_REMATCH[1]}
idx=$(($ordinal + 1))
if [[ $ordinal -eq 0 ]]; then
exec dgraph zero --my=$(hostname -f):5080 --idx $idx --replicas 3
else
exec dgraph zero --my=$(hostname -f):5080 --peer dgraph-zero-0.dgraph-zero.${POD_NAMESPACE}.svc.cluster.local:5080 --idx $idx --replicas 3
fi
terminationGracePeriodSeconds: 60
volumes:
- name: datadir
persistentVolumeClaim:
claimName: datadir
updateStrategy:
type: RollingUpdate
volumeClaimTemplates:
- metadata:
name: datadir
annotations:
volume.alpha.kubernetes.io/storage-class: anything
spec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: 20Gi
---
# This StatefulSet runs 3 replicas of Dgraph Alpha.
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: dgraph-alpha
spec:
serviceName: "dgraph-alpha"
podManagementPolicy: "Parallel"
replicas: 3
selector:
matchLabels:
app: dgraph-alpha
template:
metadata:
labels:
app: dgraph-alpha
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- dgraph-alpha
topologyKey: kubernetes.io/hostname
# Initializing the Alphas:
#
# You may want to initialize the Alphas with data before starting, e.g.
# with data from the Dgraph Bulk Loader: https://docs.dgraph.io/deploy/#bulk-loader.
# You can accomplish by uncommenting this initContainers config. This
# starts a container with the same /dgraph volume used by Alpha and runs
# before Alpha starts.
#
# You can copy your local p directory to the pod's /dgraph/p directory
# with this command:
#
# kubectl cp path/to/p dgraph-alpha-0:/dgraph/ -c init-alpha
# (repeat for each alpha pod)
#
# When you're finished initializing each Alpha data directory, you can signal
# it to terminate successfully by creating a /dgraph/doneinit file:
#
# kubectl exec dgraph-alpha-0 -c init-alpha touch /dgraph/doneinit
#
# Note that pod restarts cause re-execution of Init Containers. Since
# /dgraph is persisted across pod restarts, the Init Container will exit
# automatically when /dgraph/doneinit is present and proceed with starting
# the Alpha process.
#
# Tip: StatefulSet pods can start in parallel by configuring
# .spec.podManagementPolicy to Parallel:
#
# https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#deployment-and-scaling-guarantees
#
initContainers:
- name: init-alpha
image: dgraph/dgraph:latest
command:
- bash
- "-c"
- |
echo "Write to /dgraph/doneinit when ready."
until [ -f /dgraph/doneinit ]; do sleep 2; done
volumeMounts:
- name: datadir
mountPath: /dgraph
containers:
- name: alpha
image: dgraph/dgraph:latest
imagePullPolicy: IfNotPresent
ports:
- containerPort: 7080
name: alpha-grpc-int
- containerPort: 8080
name: alpha-http
- containerPort: 9080
name: alpha-grpc
volumeMounts:
- name: datadir
mountPath: /dgraph
env:
# This should be the same namespace as the dgraph-zero
# StatefulSet to resolve a Dgraph Zero's DNS name for
# Alpha's --zero flag.
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
command:
- bash
- "-c"
- |
set -ex
dgraph alpha --my=$(hostname -f):7080 --lru_mb 1433 --zero dgraph-zero-0.dgraph-zero.${POD_NAMESPACE}.svc.cluster.local:5080
terminationGracePeriodSeconds: 600
volumes:
- name: datadir
persistentVolumeClaim:
claimName: datadir
updateStrategy:
type: RollingUpdate
volumeClaimTemplates:
- metadata:
name: datadir
spec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: 50Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: dgraph-ratel
labels:
app: dgraph-ratel
spec:
selector:
matchLabels:
app: dgraph-ratel
template:
metadata:
labels:
app: dgraph-ratel
spec:
containers:
- name: ratel
image: dgraph/dgraph:latest
ports:
- containerPort: 8000
command:
- dgraph-ratel
Its based on the config in the kubernetes deployment documentation, i’ve added
podManagementPolicy: "Parallel"
Its a 3 node cluster with 4gb ram each, running on scaleway. I never had this issue when using gcp, so i’m suspecting it might have to do with my cloud provider
I tried the command
kubectl get events --sort-by=.metadata.creationTimestamp
but it returns no resources found
Here are some logs that i got immediately after restarting each alpha pod:
dgraph-alpha-2.logs.txt (6.0 KB) dgraph-alpha-1.logs.txt (13.6 KB) dgraph-alpha-0.logs.txt (13.5 KB)
The Kubernetes manifests look good - I could get a working 3-node cluster running locally(using kind) with your config.
Events like image pulls, health check failures, nodepressures etc are logged in the events - it is unusual that there aren’t any events(possibly cause they had reached their TTL by the time they were retrieved) - could you delete the statefulsets(if possible) and recreate and post the events that are logged in the same namespace as the deployment? That would be super helpful.
Unsure how Scaleway does its (CNI) networking - would be a good idea to setup a parallel deployment of netshoot pods on different hosts to check if there are any connectivity issues there.
@pawan do the logs suggest something?
Sorry i don’t know how to “setup a parallel deployment of netshoot pods”.
Sorry - I meant, if you believe there is an issue with the cloud provider, specifically with regards to the inter-pod/node networking, you could setup a basic statefulset as such and check if the pods are able to communicate with each other.
---
apiVersion: v1
kind: Service
metadata:
name: nginx
labels:
app: nginx
spec:
ports:
- port: 80
name: web
clusterIP: None
selector:
app: nginx
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
labels:
app: nginx
spec:
serviceName: "nginx"
selector:
matchLabels:
app: nginx
replicas: 3
template:
metadata:
labels:
app: nginx
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- nginx
topologyKey: kubernetes.io/hostname
containers:
- name: nginx
image: k8s.gcr.io/nginx-slim:0.8
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Gi
You could then exec into the first container kubectl exec -it web-0 -- bash
and try to connect to the second pod as such curl http://web-1.nginx.default.svc.cluster.local
- if it works well consistently, then we might to have to look at other things like the events and the Dgraph logs in detail.
(netshoot
, has a few more tools beside curl to help with debugging if you choose to do so)
okay thanks will try this