Data is not loading from main alpha server when other replicas was down

I am using dgraph 1.0.16 and I have created a replication=3 with 1 zero and 3 alpha servers(alpha1, alpha2, alpha3). When I stopped other 2 alpha servers(alpha2, alpha3), I am getting below error in the main alpha server(alpha1) which is expected.


E0717 13:21:47.690749   21973 groups.go:853] While proposing delta with MaxAssigned: 33296 and num txns: 0. Error=Server overloaded with pending proposals. Please retry later. Retrying...
W0717 13:21:50.543436   21973 node.go:419] Unable to send message to peer: 0x2. Error: Unhealthy connection
W0717 13:21:55.503609   21973 node.go:419] Unable to send message to peer: 0x3. Error: Unhealthy connection

But the problem is when I try to connect the main alpha server(alpha1) from Ratel to search some data the loader keeps loading with the text “Fetching result…” and nothing happens.

When I start the other alpha server(alpha2 or alpha3) then only I get the search results, which is quite confusing.

Isn’t it should search for data from the main alpha server(alpha1) even the other replicated server was down?

Am I missing anything?

Please help.

Hi,

Will you share the logs of the server alpha1 with us?

Thanks

Hey @nshah14285,

Welcome to the channel.

Since you have set up multiple replicas, a majority of the group must be up in order to serve requests.
Hence 2 of 3 alphas should be up to be able to serve requests.

@amanmangal I will share the logs tomorrow.

1 Like

@hackintoshrao out of 3 alphas only one main alpha is up and other 2 alphas was down.

So I am expecting the search for data result from that main alpha as other 2 was down.

@amanmangal
Below is the alpha1 server logs

dgraph zero log

I0718 13:08:37.019255    7410 zero.go:396] Got connection request: cluster_info_only:true
I0718 13:08:37.019477    7410 zero.go:414] Connected: cluster_info_only:true
W0718 13:15:36.983632    7410 pool.go:226] Connection lost with 192.168.0.120:7082. Error: rpc error: code = Unavailable desc = transport is closing
W0718 13:15:41.731425    7410 pool.go:226] Connection lost with 192.168.0.119:7081. Error: rpc error: code = Unavailable desc = transport is closing

Logs when 2 alphas was down

W0718 13:15:36.983681    7465 pool.go:226] Connection lost with 192.168.0.120:7082. Error: rpc error: code = Unavailable desc = transport is closing
W0718 13:15:36.998260    7465 node.go:419] Unable to send message to peer: 0x3. Error: EOF
W0718 13:15:38.018537    7465 node.go:419] Unable to send message to peer: 0x3. Error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 192.168.0.120:7082: connect: connection refused"
W0718 13:15:41.731781    7465 pool.go:226] Connection lost with 192.168.0.119:7081. Error: rpc error: code = Unavailable desc = transport is closing
W0718 13:15:41.738476    7465 node.go:419] Unable to send message to peer: 0x2. Error: EOF
W0718 13:15:42.758646    7465 node.go:419] Unable to send message to peer: 0x2. Error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 192.168.0.119:7081: connect: connection refused"
W0718 13:15:48.038569    7465 node.go:419] Unable to send message to peer: 0x3. Error: Unhealthy connection
W0718 13:15:52.778602    7465 node.go:419] Unable to send message to peer: 0x2. Error: Unhealthy connection
W0718 13:15:58.058537    7465 node.go:419] Unable to send message to peer: 0x3. Error: Unhealthy connection
W0718 13:16:02.798560    7465 node.go:419] Unable to send message to peer: 0x2. Error: Unhealthy connection
W0718 13:16:08.078488    7465 node.go:419] Unable to send message to peer: 0x3. Error: Unhealthy connection
W0718 13:16:12.818665    7465 node.go:419] Unable to send message to peer: 0x2. Error: Unhealthy connection
W0718 13:16:18.098548    7465 node.go:419] Unable to send message to peer: 0x3. Error: Unhealthy connection
W0718 13:16:22.838698    7465 node.go:419] Unable to send message to peer: 0x2. Error: Unhealthy connection
W0718 13:16:28.118613    7465 node.go:419] Unable to send message to peer: 0x3. Error: Unhealthy connection
W0718 13:16:32.858608    7465 node.go:419] Unable to send message to peer: 0x2. Error: Unhealthy connection
W0718 13:16:38.138652    7465 node.go:419] Unable to send message to peer: 0x3. Error: Unhealthy connection
W0718 13:16:42.878591    7465 node.go:419] Unable to send message to peer: 0x2. Error: Unhealthy connection
W0718 13:16:48.158601    7465 node.go:419] Unable to send message to peer: 0x3. Error: Unhealthy connection
W0718 13:16:52.898695    7465 node.go:419] Unable to send message to peer: 0x2. Error: Unhealthy connection
E0718 13:20:55.110526    7465 groups.go:853] While proposing delta with MaxAssigned: 10009 and num txns: 0. Error=Server overloaded with pending proposals. Please retry later. Retrying...
W0718 13:20:58.618655    7465 node.go:419] Unable to send message to peer: 0x3. Error: Unhealthy connection
W0718 13:21:03.378642    7465 node.go:419] Unable to send message to peer: 0x2. Error: Unhealthy connection
W0718 13:21:08.638367    7465 node.go:419] Unable to send message to peer: 0x3. Error: Unhealthy connection

Logs when alpha1 and alpha2 was up and alpha3 was down

W0718 13:23:28.918459    7465 node.go:419] Unable to send message to peer: 0x3. Error: Unhealthy connection
W0718 13:23:38.938668    7465 node.go:419] Unable to send message to peer: 0x3. Error: Unhealthy connection
W0718 13:23:48.958617    7465 node.go:419] Unable to send message to peer: 0x3. Error: Unhealthy connection
W0718 13:23:58.978619    7465 node.go:419] Unable to send message to peer: 0x3. Error: Unhealthy connection
W0718 13:24:08.998654    7465 node.go:419] Unable to send message to peer: 0x3. Error: Unhealthy connection
W0718 13:24:19.018619    7465 node.go:419] Unable to send message to peer: 0x3. Error: Unhealthy connection
W0718 13:24:29.038595    7465 node.go:419] Unable to send message to peer: 0x3. Error: Unhealthy connection

Makes sense. As @hackintoshrao said, queries may fail unless a majority of Alphas are up. You could try best effort queries. Best effort queries do not require a majority of replicas to be up.

@amanmangal

So it means minimum 2 replicas has to be up then only data will fetch, right?

Best effort queries do not require a majority of replicas to be up
Do you mean the optimized queries?

Best effort queries are optimized to run faster and not necessarily provide the latest result. Given that you only have 3 nodes (and replication is set to 3 too), the queries should return a response.