"Please retry again, server is not ready to accept requests" should i be expecting this on a regular basis?

My api service was working a minute ago, then it’s not. I’ve got 3 node k8 cluster up with x3 replication, How do i diagnose this issue ? is this what i should expect on a regular basis ?

i’ve got this working again by restarting the pods, but i need an explanation to why this happens and if i would need to expect this often.

update: its happening again, restarting pods can’t be a viable way to fix this.

Please can i get some help concerning this issue ? thanks in advance

The logs should say what’s happening in the cluster when you’re seeing this message. If you can share cluster logs (all Zeros and Alphas) that would help.

Not ready to accept requests can mean there’s an issue with the instances being able to communicate with each other. Typically, the instance hasn’t been able to establish a quorum.

Hmm okay, sorry how do i check the logs again ?

Logs are written to stdout/stderr. On Kubernetes, you can use kubectl logs <podname> to get the logs for that pod.

1 Like
 16 node.go:182] Setting conf state to nodes:1
I0414 13:23:58.416031      16 node.go:182] Setting conf state to nodes:1 nodes:2
I0414 13:24:00.903309      16 log.go:34] 1 is starting a new election at term 3
I0414 13:24:00.903362      16 log.go:34] 1 became pre-candidate at term 3
I0414 13:24:00.903371      16 log.go:34] 1 received MsgPreVoteResp from 1 at term 3
I0414 13:24:00.903982      16 log.go:34] 1 [logterm: 3, index: 421] sent MsgPreVote request to 2 at term 3
I0414 13:24:03.155852      16 log.go:34] 1 became follower at term 3
I0414 13:24:03.156289      16 log.go:34] raft.node: 1 elected leader 2 at term 3
I0414 13:24:03.295733      16 admin.go:513] No GraphQL schema in Dgraph; serving empty GraphQL API
I0414 13:26:21.419924      16 log.go:34] 1 [term 3] received MsgTimeoutNow from 2 and starts an election to get leadership.
I0414 13:26:21.420430      16 log.go:34] 1 became candidate at term 4
I0414 13:26:21.420524      16 log.go:34] 1 received MsgVoteResp from 1 at term 4
I0414 13:26:21.420755      16 log.go:34] 1 [logterm: 3, index: 422] sent MsgVote request to 2 at term 4
I0414 13:26:21.421345      16 log.go:34] raft.node: 1 lost leader 2 at term 4
I0414 13:26:21.446151      16 log.go:34] 1 received MsgVoteResp from 2 at term 4
I0414 13:26:21.446277      16 log.go:34] 1 [quorum:2] has received 2 MsgVoteResp votes and 0 vote rejections
I0414 13:26:21.446368      16 log.go:34] 1 became leader at term 4
I0414 13:26:21.446524      16 log.go:34] raft.node: 1 elected leader 1 at term 4
I0414 13:26:22.401743      16 groups.go:856] Leader idx=0x1 of group=1 is connecting to Zero for txn updates
I0414 13:26:22.401870      16 groups.go:865] Got Zero leader: dgraph-zero-0.dgraph-zero.default.svc.cluster.local:5080
E0414 13:26:22.419559      16 groups.go:1093] Error from worker subscribe stream: rpc error: code = Unavailable desc = transport is closing
I0414 13:26:22.424619      16 pool.go:160] CONNECTING to dgraph-alpha-0.dgraph-alpha.default.svc.cluster.local:7080
W0414 13:26:22.421977      16 pool.go:254] Connection lost with dgraph-alpha-1.dgraph-alpha.default.svc.cluster.local:7080. Error: rpc error: code = Unavailable desc = transport is closing
W0414 13:26:22.503802      16 node.go:417] Unable to send message to peer: 0x2. Error: EOF
W0414 13:26:23.605793      16 node.go:417] Unable to send message to peer: 0x2. Error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: lookup dgraph-alpha-1.dgraph-alpha.default.svc.cluster.local: no such host"
W0414 13:26:33.703619      16 node.go:417] Unable to send message to peer: 0x2. Error: Unhealthy connection
W0414 13:26:43.804177      16 node.go:417] Unable to send message to peer: 0x2. Error: Unhealthy connection
W0414 13:26:53.903742      16 node.go:417] Unable to send message to peer: 0x2. Error: Unhealthy connection
W0414 13:27:03.904037      16 node.go:417] Unable to send message to peer: 0x2. Error: Unhealthy connection
I0414 13:27:10.708056      16 log.go:34] 1 [logterm: 4, index: 424, vote: 1] cast MsgPreVote for 2 [logterm: 4, index: 4

Could this issue be because of cloud provider ?

From the error above, seems like the alpha node can’t find the other alpha node. Can you share logs from the other alpha node as well? In general, the more logs that we can see i.e. alpha, zero nodes, the easier it would be to diagnose what might be happening.

dgraph-alpha-1:


I0416 07:57:14.781286      15 node.go:182] Setting conf state to nodes:1
I0416 07:57:14.781953      15 node.go:182] Setting conf state to nodes:1 nodes:2
I0416 07:57:17.963257      15 log.go:34] 2 is starting a new election at term 6
I0416 07:57:17.963289      15 log.go:34] 2 became pre-candidate at term 6
I0416 07:57:17.963297      15 log.go:34] 2 received MsgPreVoteResp from 2 at term 6
I0416 07:57:17.964113      15 log.go:34] 2 [logterm: 6, index: 827] sent MsgPreVote request to 1 at term 6
W0416 07:57:18.965691      15 node.go:417] Unable to send message to peer: 0x1. Error: Do not have address of peer 0x1  I0416 07:57:19.624935      15 admin.go:513] No GraphQL schema in Dgraph; serving empty GraphQL API
I0416 07:57:20.680385      15 log.go:34] 2 became follower at term 6
I0416 07:57:20.681087      15 log.go:34] raft.node: 2 elected leader 1 at term 6

dgraph-alpha-2

I0414 10:48:23.109031      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:48:28.110375      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:48:28.110492      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:48:33.111766      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:48:33.111824      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:48:38.112960      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:48:38.113018      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:48:43.113783      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:48:43.113822      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:48:48.115369      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:48:48.115466      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:48:53.122708      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:48:53.122753      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:48:58.135103      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:48:58.135410      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:49:03.136500      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:49:03.136551      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:49:08.137264      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:49:08.137310      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:49:13.149530      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:49:13.149564      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:49:18.150639      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:49:18.150680      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:49:23.151035      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:49:23.151074      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:49:28.151437      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:49:28.151707      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:49:33.152998      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:49:33.153043      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:49:38.154053      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:49:38.154099      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:49:43.155366      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:49:43.155414      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:49:48.175450      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:49:48.175634      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:49:53.185610      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:49:53.185675      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:49:58.199523      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:49:58.199557      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:50:03.206144      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:50:03.206189      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:50:08.206560      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:50:08.206603      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:50:13.207620      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:50:13.207671      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:50:18.208201      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:50:18.208241      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:50:23.209541      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:50:23.209591      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:50:28.210458      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:50:28.210501      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:50:33.211682      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:50:33.211741      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:50:38.213191      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:50:38.213245      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:50:43.213969      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:50:43.214144      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:50:48.214972      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:50:48.215146      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:50:53.216062      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:50:53.216403      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:50:58.216847      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:50:58.216891      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:51:03.218236      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:51:03.218274      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:51:08.219384      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:51:08.219419      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:51:13.220371      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:51:13.220745      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:51:18.221698      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:51:18.221751      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:51:23.223014      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:51:23.223054      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:51:28.223795      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:51:28.223841      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:51:33.225050      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:51:33.225088      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:51:38.225793      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:51:38.225835      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:51:43.226784      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:51:43.226830      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:51:48.227708      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:51:48.227894      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:51:53.228162      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:51:53.228197      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:51:58.229255      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:51:58.229311      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:52:03.229643      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:52:03.229688      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:52:08.230732      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:52:08.230780      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:52:13.231898      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:52:13.231947      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:52:18.232406      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:52:18.232493      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:52:23.233338      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:52:23.233411      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:52:28.234162      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:52:28.234205      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:52:33.238082      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:52:33.238115      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:52:38.239516      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:52:38.239557      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:52:43.240313      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:52:43.240352      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:52:48.241183      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:52:48.241238      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:52:53.241620      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:52:53.241658      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:52:58.242938      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:52:58.242995      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:53:03.244164      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:53:03.244213      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:53:08.245165      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:53:08.245212      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:53:13.246063      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:53:13.246105      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:53:18.247444      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:53:18.247531      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:53:23.247917      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:53:23.247983      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:53:28.248510      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:53:28.248562      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:53:33.249977      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:53:33.250103      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:53:38.251511      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:53:38.251545      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:53:43.253466      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:53:43.253514      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:53:48.254395      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:53:48.254452      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:53:53.255708      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:53:53.255740      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:53:58.256412      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests
I0414 10:53:58.256459      25 admin.go:510] Error reading GraphQL schema: Dgraph query failed because Dgraph query failed because Please retry again, server is not ready to accept requests.
I0414 10:54:03.257473      25 query.go:123] Dgraph query execution failed : Dgraph query failed because Please retry again, server is not ready to accept requests

dgraph-zero-0



I0413 14:14:39.068546      18 run.go:105] Setting up grpc listener at: 0.0.0.0:5080
I0413 14:14:39.069093      18 run.go:105] Setting up http listener at: 0.0.0.0:6080
badger 2020/04/13 14:14:39 INFO: All 0 tables opened in 0s
I0413 14:14:39.188622      18 node.go:145] Setting raft.Config to: &{ID:1 peers:[] learners:[] ElectionTick:20 Heartbeat                         Tick:1 Storage:0xc0004c2330 Applied:0 MaxSizePerMsg:262144 MaxCommittedSizePerReady:67108864 MaxUncommittedEntriesSize:0                          MaxInflightMsgs:256 CheckQuorum:false PreVote:true ReadOnlyOption:0 Logger:0x260a270 DisableProposalForwarding:false}                           I0413 14:14:39.190719      18 node.go:323] Group 0 found 1 entries
I0413 14:14:39.191545      18 log.go:34] 1 became follower at term 0
I0413 14:14:39.192196      18 log.go:34] newRaft 1 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0                         ]
I0413 14:14:39.192299      18 log.go:34] 1 became follower at term 1
I0413 14:14:39.193028      18 run.go:296] Running Dgraph Zero...
E0413 14:14:39.193395      18 raft.go:516] While proposing CID: Not Zero leader. Aborting proposal: cid:"224d54b1-0dde-4                         68e-897c-e83f610ea2b5" . Retrying...
I0413 14:14:39.201673      18 node.go:182] Setting conf state to nodes:1
I0413 14:14:39.201876      18 raft.go:702] Done applying conf change at 0x1
I0413 14:14:40.193873      18 log.go:34] 1 no leader at term 1; dropping index reading msg
W0413 14:14:42.193869      18 node.go:671] [0x1] Read index context timed out
I0413 14:14:42.194024      18 log.go:34] 1 no leader at term 1; dropping index reading msg
E0413 14:14:42.195303      18 raft.go:516] While proposing CID: Not Zero leader. Aborting proposal: cid:"dd005d83-a536-4                         ee8-bc70-5d410b74c484" . Retrying...
I0413 14:14:42.893187      18 log.go:34] 1 is starting a new election at term 1
I0413 14:14:42.893382      18 log.go:34] 1 became pre-candidate at term 1
I0413 14:14:42.893429      18 log.go:34] 1 received MsgPreVoteResp from 1 at term 1
I0413 14:14:42.893587      18 log.go:34] 1 became candidate at term 2
I0413 14:14:42.893635      18 log.go:34] 1 received MsgVoteResp from 1 at term 2
I0413 14:14:42.893839      18 log.go:34] 1 became leader at term 2
I0413 14:14:42.894064      18 log.go:34] raft.node: 1 elected leader 1 at term 2
I0413 14:14:42.894225      18 raft.go:667] I've become the leader, updating leases.
I0413 14:14:42.894300      18 assign.go:42] Updated Lease id: 1. Txn Ts: 1
W0413 14:14:44.194156      18 node.go:671] [0x1] Read index context timed out
I0413 14:14:45.260105      18 raft.go:509] CID set for cluster: 4f12b9fb-f876-4f6e-a286-6dd0fcf1f453
I0413 14:14:45.269652      18 license_ee.go:45] Enterprise state proposed to the cluster: key:"z1-8987910882978745324" l                         icense:<maxNodes:18446744073709551615 expiryTs:1589379285 >
I0413 14:14:53.031052      18 pool.go:160] CONNECTING to dgraph-zero-1.dgraph-zero.default.svc.cluster.local:5080                                I0413 14:14:53.031227      18 node.go:583] Trying to add 0x2 to cluster. Addr: dgraph-zero-1.dgraph-zero.default.svc.clu                         ster.local:5080
I0413 14:14:53.031236      18 node.go:584] Current confstate at 0x1: nodes:1
W0413 14:14:53.045038      18 pool.go:254] Connection lost with dgraph-zero-1.dgraph-zero.default.svc.cluster.local:5080                         . Error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection                          error: desc = "transport: Error while dialing dial tcp: lookup dgraph-zero-1.dgraph-zero.default.svc.cluster.local: no s                         uch host"
I0413 14:14:53.083970      18 node.go:182] Setting conf state to nodes:1 nodes:2
I0413 14:14:53.084073      18 raft.go:702] Done applying conf change at 0x1
I0413 14:14:53.084121      18 node.go:746] [0x2] Done joining cluster with err: <nil>
W0413 14:14:54.094175      18 node.go:417] Unable to send message to peer: 0x2. Error: rpc error: code = Unavailable des                         c = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dial                         ing dial tcp: lookup dgraph-zero-1.dgraph-zero.default.svc.cluster.local: no such host"
W0413 14:14:55.193806      18 node.go:671] [0x1] Read index context timed out
W0413 14:14:57.194036      18 node.go:671] [0x1] Read index context timed out
W0413 14:14:59.194137      18 node.go:671] [0x1] Read index context timed out
I0413 20:42:29.849597      18 zero.go:417] Got connection request: cluster_info_only:true
I0413 20:42:29.854773      18 zero.go:435] Connected: cluster_info_only:true
I0413 20:42:29.858048      18 zero.go:417] Got connection request: group_id:1 addr:"dgraph-alpha-0.dgraph-alpha.default.                         svc.cluster.local:7080" force_group_id:true
I0413 20:42:29.862610      18 pool.go:160] CONNECTING to dgraph-alpha-0.dgraph-alpha.default.svc.cluster.local:7080                              W0413 20:42:29.868510      18 pool.go:254] Connection lost with dgraph-alpha-0.dgraph-alpha.default.svc.cluster.local:70                         80. Error: rpc error: code = Unknown desc = No node has been set up yet
I0413 20:42:29.907751      18 zero.go:562] Connected: id:1 group_id:1 addr:"dgraph-alpha-0.dgraph-alpha.default.svc.clus                         ter.local:7080" force_group_id:true
I0413 20:43:46.966780      18 zero.go:417] Got connection request: cluster_info_only:true
I0413 20:43:46.974174      18 zero.go:435] Connected: cluster_info_only:true
I0413 20:43:46.985951      18 zero.go:417] Got connection request: group_id:1 addr:"dgraph-alpha-1.dgraph-alpha.default.                         svc.cluster.local:7080" force_group_id:true
I0413 20:43:46.991445      18 pool.go:160] CONNECTING to dgraph-alpha-1.dgraph-alpha.default.svc.cluster.local:7080                              W0413 20:43:47.026844      18 pool.go:254] Connection lost with dgraph-alpha-1.dgraph-alpha.default.svc.cluster.local:70                         80. Error: rpc error: code = Unknown desc = No node has been set up yet
I0413 20:43:47.029240      18 zero.go:562] Connected: id:2 group_id:1 addr:"dgraph-alpha-1.dgraph-alpha.default.svc.clus                         ter.local:7080" force_group_id:true
I0414 11:51:22.565870      18 zero.go:417] Got connection request: cluster_info_only:true
I0414 11:51:22.573680      18 zero.go:435] Connected: cluster_info_only:true
I0414 11:51:22.577224      18 zero.go:417] Got connection request: id:1 addr:"dgraph-alpha-0.dgraph-alpha.default.svc.cl                         uster.local:7080"
I0414 11:51:22.580127      18 zero.go:544] Connected: id:1 addr:"dgraph-alpha-0.dgraph-alpha.default.svc.cluster.local:7                         080"
I0414 13:23:58.384835      18 zero.go:417] Got connection request: cluster_info_only:true
I0414 13:23:58.388449      18 zero.go:435] Connected: cluster_info_only:true
I0414 13:23:58.390755      18 zero.go:417] Got connection request: id:1 addr:"dgraph-alpha-0.dgraph-alpha.default.svc.cl                         uster.local:7080"
I0414 13:23:58.393197      18 zero.go:544] Connected: id:1 addr:"dgraph-alpha-0.dgraph-alpha.default.svc.cluster.local:7080"
I0414 13:27:04.789146      18 zero.go:417] Got connection request: cluster_info_only:true
I0414 13:27:04.791805      18 zero.go:435] Connected: cluster_info_only:true
I0414 13:27:04.795815      18 zero.go:417] Got connection request: id:2 addr:"dgraph-alpha-1.dgraph-alpha.default.svc.cluster.local:7080"        I0414 13:27:04.798797      18 zero.go:544] Connected: id:2 addr:"dgraph-alpha-1.dgraph-alpha.default.svc.cluster.local:7080"
I0416 07:55:01.174974      18 zero.go:417] Got connection request: cluster_info_only:true
I0416 07:55:01.178270      18 zero.go:435] Connected: cluster_info_only:true
I0416 07:55:01.181613      18 zero.go:417] Got connection request: id:2 addr:"dgraph-alpha-1.dgraph-alpha.default.svc.cluster.local:7080"        I0416 07:55:01.184757      18 zero.go:544] Connected: id:2 addr:"dgraph-alpha-1.dgraph-alpha.default.svc.cluster.local:7080"
I0416 07:56:13.076416      18 zero.go:417] Got connection request: cluster_info_only:true
I0416 07:56:13.079198      18 zero.go:435] Connected: cluster_info_only:true
I0416 07:56:13.082974      18 zero.go:417] Got connection request: id:1 addr:"dgraph-alpha-0.dgraph-alpha.default.svc.cluster.local:7080"        I0416 07:56:13.086004      18 zero.go:544] Connected: id:1 addr:"dgraph-alpha-0.dgraph-alpha.default.svc.cluster.local:7080"
I0416 07:57:20.461974      18 zero.go:417] Got connection request: cluster_info_only:true
I0416 07:57:20.466119      18 zero.go:435] Connected: cluster_info_only:true
I0416 07:57:20.470970      18 zero.go:417] Got connection request: id:2 addr:"dgraph-alpha-1.dgraph-alpha.default.svc.cluster.local:7080"        I0416 07:57:20.474443      18 zero.go:544] Connected: id:2 addr:"dgraph-alpha-1.dgraph-alpha.default.svc.cluster.local:7080"

dgraph-zero-1


I0413 14:14:49.103142      18 run.go:105] Setting up grpc listener at: 0.0.0.0:5080
I0413 14:14:49.103606      18 run.go:105] Setting up http listener at: 0.0.0.0:6080
badger 2020/04/13 14:14:49 INFO: All 0 tables opened in 0s
I0413 14:14:49.167000      18 node.go:145] Setting raft.Config to: &{ID:2 peers:[] learners:[] ElectionTick:20 HeartbeatTick:1 Storage:0xc0006ce2a0 Applied:0 MaxSizePerMsg:262144 MaxCommittedSizePerReady:67108864 MaxUncommittedEntriesSize:0 MaxInflightMsgs:256 CheckQuorum:false PreVote:true ReadOnlyOption:0 Logger:0x260a270 DisableProposalForwarding:false}
I0413 14:14:49.167923      18 node.go:323] Group 0 found 1 entries
I0413 14:14:49.168721      18 pool.go:160] CONNECTING to dgraph-zero-0.dgraph-zero.default.svc.cluster.local:5080
I0413 14:14:49.235316      18 raft.go:494] [0x2] Starting node
I0413 14:14:49.236532      18 log.go:34] 2 became follower at term 0
I0413 14:14:49.236584      18 log.go:34] newRaft 2 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
I0413 14:14:49.236609      18 log.go:34] 2 became follower at term 1
I0413 14:14:49.238097      18 run.go:296] Running Dgraph Zero...
I0413 14:14:50.237320      18 log.go:34] 2 no leader at term 1; dropping index reading msg
W0413 14:14:52.237539      18 node.go:671] [0x2] Read index context timed out
I0413 14:14:52.238365      18 log.go:34] 2 no leader at term 1; dropping index reading msg
W0413 14:14:54.238761      18 node.go:671] [0x2] Read index context timed out
I0413 14:14:54.238844      18 log.go:34] 2 no leader at term 1; dropping index reading msg
W0413 14:14:56.239043      18 node.go:671] [0x2] Read index context timed out
I0413 14:14:56.239119      18 log.go:34] 2 no leader at term 1; dropping index reading msg
I0413 14:14:56.253538      18 log.go:34] 2 [term: 1] received a MsgHeartbeat message with higher term from 1 [term: 2]
I0413 14:14:56.253584      18 log.go:34] 2 became follower at term 2
I0413 14:14:56.253604      18 log.go:34] raft.node: 2 elected leader 1 at term 2
I0413 14:14:57.347882      18 node.go:182] Setting conf state to nodes:1
I0413 14:14:57.347994      18 raft.go:702] Done applying conf change at 0x2
I0413 14:14:57.348045      18 node.go:182] Setting conf state to nodes:1 nodes:2
I0413 14:14:57.348069      18 raft.go:702] Done applying conf change at 0x2
W0413 14:14:58.239310      18 node.go:671] [0x2] Read index context timed out
I0413 20:42:25.874801      18 pool.go:160] CONNECTING to dgraph-alpha-0.dgraph-alpha.default.svc.cluster.local:7080
I0413 20:43:43.014088      18 pool.go:160] CONNECTING to dgraph-alpha-1.dgraph-alpha.default.svc.cluster.local:7080
W0414 11:49:50.697042      18 pool.go:254] Connection lost with dgraph-alpha-0.dgraph-alpha.default.svc.cluster.local:7080. Error: rpc error: code = Unavailable desc = transport is closing
W0414 13:26:17.900615      18 pool.go:254] Connection lost with dgraph-alpha-1.dgraph-alpha.default.svc.cluster.local:7080. Error: rpc error: code = Unavailable desc = transport is closing

dgraph-zero-2

E0416 05:08:04.776225      18 raft.go:486] Error while joining cluster: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: i/o timeout"
E0416 05:08:46.700031      18 raft.go:486] Error while joining cluster: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: i/o timeout"
E0416 05:09:28.649858      18 raft.go:486] Error while joining cluster: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: i/o timeout"
E0416 05:10:10.594666      18 raft.go:486] Error while joining cluster: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: i/o timeout"
E0416 05:10:52.285633      18 raft.go:486] Error while joining cluster: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: i/o timeout"
E0416 05:11:34.399184      18 raft.go:486] Error while joining cluster: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: i/o timeout"
E0416 05:12:16.529782      18 raft.go:486] Error while joining cluster: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: i/o timeout"
E0416 05:12:58.285969      18 raft.go:486] Error while joining cluster: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: i/o timeout"
E0416 05:13:40.387603      18 raft.go:486] Error while joining cluster: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: i/o timeout"
E0416 05:14:22.531640      18 raft.go:486] Error while joining cluster: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: i/o timeout"
E0416 05:15:04.311339      18 raft.go:486] Error while joining cluster: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: i/o timeout"
E0416 05:15:46.401998      18 raft.go:486] Error while joining cluster: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: i/o timeout"
E0416 05:16:28.537464      18 raft.go:486] Error while joining cluster: rpc error: code = Unavailable desc = all SubConns are in TransientFail^C 

Ideally, we’d like to see logs from when the server started/restarted as that would help us see if the connection was setup correctly in the first place and when did it break. That being said, I can see some unexpected errors like i/o timeout above.

We would need more information about your Kubernetes config and a way to reproduce this on our end. We have users running Dgraph on k8s so it most probably looks like a config issue at your end. Tagging @slotlocker2 who should be able to help you more here.

It would also be worthwhile to look at the events from the namespace of the Dgraph pods - you could retrieve them using kubectl get events --sort-by=.metadata.creationTimestamp ( I believe on a typical K8S cluster, the events have a TTL of an hour, so events from around the time of failure would be helpful).

This aside from any insights into how the cluster was setup, and config used to spin up the Dgraph pods might get a better understanding of what’s happening on the cluster. @scroobius-pip

Here’s my kubernetes config:


# This highly available config creates 3 Dgraph Zeros, 3 Dgraph

# Alphas with 3 replicas, and 1 Ratel UI client. The Dgraph cluster

# will still be available to service requests even when one Zero

# and/or one Alpha are down.

#

# There are 4 public services exposed, users can use:

#       dgraph-zero-public - To load data using Live & Bulk Loaders

#       dgraph-alpha-public - To connect clients and for HTTP APIs

#       dgraph-ratel-public - For Dgraph UI

#       dgraph-alpha-x-http-public - Use for debugging & profiling

# apiVersion: v1

# kind: Service

# metadata:

#   name: dgraph-zero-public

#   labels:

#     app: dgraph-zero

# spec:

#   type: LoadBalancer

#   ports:

#   - port: 5080

#     targetPort: 5080

#     name: zero-grpc

#   - port: 6080

#     targetPort: 6080

#     name: zero-http

#   selector:

#     app: dgraph-zero

# ---

apiVersion: v1

kind: Service

metadata:

  name: dgraph-alpha-public

  labels:

    app: dgraph-alpha

spec:

  type: LoadBalancer

  ports:

  - port: 8080

    targetPort: 8080

    name: alpha-http

  - port: 9080

    targetPort: 9080

    name: alpha-grpc

  selector:

    app: dgraph-alpha

---

# This service is created in-order to debug & profile a specific alpha.

# You can create one for each alpha that you need to profile.

# For a more general HTTP APIs use the above service instead.

# apiVersion: v1

# kind: Service

# metadata:

#   name: dgraph-alpha-0-http-public

#   labels:

#     app: dgraph-alpha

# spec:

#   type: LoadBalancer

#   ports:

#   - port: 8080

#     targetPort: 8080

#     name: alpha-http

#   selector:

#     statefulset.kubernetes.io/pod-name: dgraph-alpha-0

# ---

# apiVersion: v1

# kind: Service

# metadata:

#   name: dgraph-ratel-public

#   labels:

#     app: dgraph-ratel

# spec:

#   type: LoadBalancer

#   ports:

#   - port: 8000

#     targetPort: 8000

#     name: ratel-http

#   selector:

#     app: dgraph-ratel

# ---

# This is a headless service which is necessary for discovery for a dgraph-zero StatefulSet.

# https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#creating-a-statefulset

apiVersion: v1

kind: Service

metadata:

  name: dgraph-zero

  labels:

    app: dgraph-zero

spec:

  ports:

  - port: 5080

    targetPort: 5080

    name: zero-grpc

  clusterIP: None

  selector:

    app: dgraph-zero

---

# This is a headless service which is necessary for discovery for a dgraph-alpha StatefulSet.

# https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#creating-a-statefulset

apiVersion: v1

kind: Service

metadata:

  name: dgraph-alpha

  labels:

    app: dgraph-alpha

spec:

  ports:

  - port: 7080

    targetPort: 7080

    name: alpha-grpc-int

  clusterIP: None

  selector:

    app: dgraph-alpha

---

# This StatefulSet runs 3 Dgraph Zero.

apiVersion: apps/v1

kind: StatefulSet

metadata:

  name: dgraph-zero

spec:

  serviceName: "dgraph-zero"

  replicas: 3

  selector:

    matchLabels:

      app: dgraph-zero

  template:

    metadata:

      labels:

        app: dgraph-zero

    spec:

      affinity:

        podAntiAffinity:

          preferredDuringSchedulingIgnoredDuringExecution:

          - weight: 100

            podAffinityTerm:

              labelSelector:

                matchExpressions:

                - key: app

                  operator: In

                  values:

                  - dgraph-zero

              topologyKey: kubernetes.io/hostname

      containers:

      - name: zero

        image: dgraph/dgraph:latest

        imagePullPolicy: IfNotPresent

        ports:

        - containerPort: 5080

          name: zero-grpc

        - containerPort: 6080

          name: zero-http

        volumeMounts:

        - name: datadir

          mountPath: /dgraph

        env:

          - name: POD_NAMESPACE

            valueFrom:

              fieldRef:

                fieldPath: metadata.namespace

        command:

          - bash

          - "-c"

          - |

            set -ex

            [[ `hostname` =~ -([0-9]+)$ ]] || exit 1

            ordinal=${BASH_REMATCH[1]}

            idx=$(($ordinal + 1))

            if [[ $ordinal -eq 0 ]]; then

              exec dgraph zero --my=$(hostname -f):5080 --idx $idx --replicas 3

            else

              exec dgraph zero --my=$(hostname -f):5080 --peer dgraph-zero-0.dgraph-zero.${POD_NAMESPACE}.svc.cluster.local:5080 --idx $idx --replicas 3

            fi

      terminationGracePeriodSeconds: 60

      volumes:

      - name: datadir

        persistentVolumeClaim:

          claimName: datadir

  updateStrategy:

    type: RollingUpdate

  volumeClaimTemplates:

  - metadata:

      name: datadir

      annotations:

        volume.alpha.kubernetes.io/storage-class: anything

    spec:

      accessModes:

        - "ReadWriteOnce"

      resources:

        requests:

          storage: 20Gi

---

# This StatefulSet runs 3 replicas of Dgraph Alpha.

apiVersion: apps/v1

kind: StatefulSet

metadata:

  name: dgraph-alpha

spec:

  serviceName: "dgraph-alpha"

  podManagementPolicy: "Parallel"

  replicas: 3

  selector:

    matchLabels:

      app: dgraph-alpha

  template:

    metadata:

      labels:

        app: dgraph-alpha

    spec:

      affinity:

        podAntiAffinity:

          preferredDuringSchedulingIgnoredDuringExecution:

          - weight: 100

            podAffinityTerm:

              labelSelector:

                matchExpressions:

                - key: app

                  operator: In

                  values:

                  - dgraph-alpha

              topologyKey: kubernetes.io/hostname

      # Initializing the Alphas:

      #

      # You may want to initialize the Alphas with data before starting, e.g.

      # with data from the Dgraph Bulk Loader: https://docs.dgraph.io/deploy/#bulk-loader.

      # You can accomplish by uncommenting this initContainers config. This

      # starts a container with the same /dgraph volume used by Alpha and runs

      # before Alpha starts.

      #

      # You can copy your local p directory to the pod's /dgraph/p directory

      # with this command:

      #

      #    kubectl cp path/to/p dgraph-alpha-0:/dgraph/ -c init-alpha

      #    (repeat for each alpha pod)

      #

      # When you're finished initializing each Alpha data directory, you can signal

      # it to terminate successfully by creating a /dgraph/doneinit file:

      #

      #    kubectl exec dgraph-alpha-0 -c init-alpha touch /dgraph/doneinit

      #

      # Note that pod restarts cause re-execution of Init Containers. Since

      # /dgraph is persisted across pod restarts, the Init Container will exit

      # automatically when /dgraph/doneinit is present and proceed with starting

      # the Alpha process.

      #

      # Tip: StatefulSet pods can start in parallel by configuring

      # .spec.podManagementPolicy to Parallel:

      #

      #     https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#deployment-and-scaling-guarantees

      #

      initContainers:

        - name: init-alpha

          image: dgraph/dgraph:latest

          command:

            - bash

            - "-c"

            - |

              echo "Write to /dgraph/doneinit when ready."

              until [ -f /dgraph/doneinit ]; do sleep 2; done

          volumeMounts:

            - name: datadir

              mountPath: /dgraph

      containers:

      - name: alpha

        image: dgraph/dgraph:latest

        imagePullPolicy: IfNotPresent

        ports:

        - containerPort: 7080

          name: alpha-grpc-int

        - containerPort: 8080

          name: alpha-http

        - containerPort: 9080

          name: alpha-grpc

        volumeMounts:

        - name: datadir

          mountPath: /dgraph

        env:

          # This should be the same namespace as the dgraph-zero

          # StatefulSet to resolve a Dgraph Zero's DNS name for

          # Alpha's --zero flag.

          - name: POD_NAMESPACE

            valueFrom:

              fieldRef:

                fieldPath: metadata.namespace

        command:

          - bash

          - "-c"

          - |

            set -ex

            dgraph alpha --my=$(hostname -f):7080 --lru_mb 1433 --zero dgraph-zero-0.dgraph-zero.${POD_NAMESPACE}.svc.cluster.local:5080

      terminationGracePeriodSeconds: 600

      volumes:

      - name: datadir

        persistentVolumeClaim:

          claimName: datadir

  updateStrategy:

    type: RollingUpdate

  volumeClaimTemplates:

  - metadata:

      name: datadir

    spec:

      accessModes:

        - "ReadWriteOnce"

      resources:

        requests:

          storage: 50Gi

---

apiVersion: apps/v1

kind: Deployment

metadata:

  name: dgraph-ratel

  labels:

    app: dgraph-ratel

spec:

  selector:

    matchLabels:

      app: dgraph-ratel

  template:

    metadata:

      labels:

        app: dgraph-ratel

    spec:

      containers:

      - name: ratel

        image: dgraph/dgraph:latest

        ports:

        - containerPort: 8000

        command:

          - dgraph-ratel

Its based on the config in the kubernetes deployment documentation, i’ve added
podManagementPolicy: "Parallel"

Its a 3 node cluster with 4gb ram each, running on scaleway. I never had this issue when using gcp, so i’m suspecting it might have to do with my cloud provider

I tried the command

kubectl get events --sort-by=.metadata.creationTimestamp

but it returns no resources found

Here are some logs that i got immediately after restarting each alpha pod:

dgraph-alpha-2.logs.txt (6.0 KB) dgraph-alpha-1.logs.txt (13.6 KB) dgraph-alpha-0.logs.txt (13.5 KB)

The Kubernetes manifests look good - I could get a working 3-node cluster running locally(using kind) with your config.

Events like image pulls, health check failures, nodepressures etc are logged in the events - it is unusual that there aren’t any events(possibly cause they had reached their TTL by the time they were retrieved) - could you delete the statefulsets(if possible) and recreate and post the events that are logged in the same namespace as the deployment? That would be super helpful.

Unsure how Scaleway does its (CNI) networking - would be a good idea to setup a parallel deployment of netshoot pods on different hosts to check if there are any connectivity issues there.

@pawan do the logs suggest something?

Sorry i don’t know how to “setup a parallel deployment of netshoot pods”.

Sorry - I meant, if you believe there is an issue with the cloud provider, specifically with regards to the inter-pod/node networking, you could setup a basic statefulset as such and check if the pods are able to communicate with each other.

---
apiVersion: v1
kind: Service
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  ports:
  - port: 80
    name: web
  clusterIP: None
  selector:
    app: nginx

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web
  labels:
    app: nginx
spec:
  serviceName: "nginx"
  selector:
    matchLabels:
      app: nginx
  replicas: 3
  template:
    metadata:
      labels:
        app: nginx
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - nginx
              topologyKey: kubernetes.io/hostname
      containers:
      - name: nginx
        image: k8s.gcr.io/nginx-slim:0.8
        ports:
        - containerPort: 80
          name: web
        volumeMounts:
        - name: www
          mountPath: /usr/share/nginx/html
  volumeClaimTemplates:
  - metadata:
      name: www
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 1Gi

You could then exec into the first container kubectl exec -it web-0 -- bash and try to connect to the second pod as such curl http://web-1.nginx.default.svc.cluster.local - if it works well consistently, then we might to have to look at other things like the events and the Dgraph logs in detail.

(netshoot, has a few more tools beside curl to help with debugging if you choose to do so)

okay thanks will try this

1 Like