Data is not loading from main alpha server when other replicas was down

nshah14285 · July 17, 2019, 9:00am

I am using dgraph 1.0.16 and I have created a replication=3 with 1 zero and 3 alpha servers(alpha1, alpha2, alpha3). When I stopped other 2 alpha servers(alpha2, alpha3), I am getting below error in the main alpha server(alpha1) which is expected.


E0717 13:21:47.690749   21973 groups.go:853] While proposing delta with MaxAssigned: 33296 and num txns: 0. Error=Server overloaded with pending proposals. Please retry later. Retrying...
W0717 13:21:50.543436   21973 node.go:419] Unable to send message to peer: 0x2. Error: Unhealthy connection
W0717 13:21:55.503609   21973 node.go:419] Unable to send message to peer: 0x3. Error: Unhealthy connection

But the problem is when I try to connect the main alpha server(alpha1) from Ratel to search some data the loader keeps loading with the text “Fetching result…” and nothing happens.

When I start the other alpha server(alpha2 or alpha3) then only I get the search results, which is quite confusing.

Isn’t it should search for data from the main alpha server(alpha1) even the other replicated server was down?

Am I missing anything?

Please help.

amanmangal · July 17, 2019, 5:53pm

Hi,

Will you share the logs of the server alpha1 with us?

Thanks

hackintoshrao · July 17, 2019, 5:57pm

Hey @nshah14285,

Welcome to the channel.

Since you have set up multiple replicas, a majority of the group must be up in order to serve requests.
Hence 2 of 3 alphas should be up to be able to serve requests.

nshah14285 · July 17, 2019, 6:04pm

@amanmangal I will share the logs tomorrow.

nshah14285 · July 18, 2019, 5:55am

@hackintoshrao out of 3 alphas only one main alpha is up and other 2 alphas was down.

So I am expecting the search for data result from that main alpha as other 2 was down.

nshah14285 · July 18, 2019, 12:17pm

@amanmangal
Below is the alpha1 server logs

dgraph zero log

I0718 13:08:37.019255    7410 zero.go:396] Got connection request: cluster_info_only:true
I0718 13:08:37.019477    7410 zero.go:414] Connected: cluster_info_only:true
W0718 13:15:36.983632    7410 pool.go:226] Connection lost with 192.168.0.120:7082. Error: rpc error: code = Unavailable desc = transport is closing
W0718 13:15:41.731425    7410 pool.go:226] Connection lost with 192.168.0.119:7081. Error: rpc error: code = Unavailable desc = transport is closing

Logs when 2 alphas was down

W0718 13:15:36.983681    7465 pool.go:226] Connection lost with 192.168.0.120:7082. Error: rpc error: code = Unavailable desc = transport is closing
W0718 13:15:36.998260    7465 node.go:419] Unable to send message to peer: 0x3. Error: EOF
W0718 13:15:38.018537    7465 node.go:419] Unable to send message to peer: 0x3. Error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 192.168.0.120:7082: connect: connection refused"
W0718 13:15:41.731781    7465 pool.go:226] Connection lost with 192.168.0.119:7081. Error: rpc error: code = Unavailable desc = transport is closing
W0718 13:15:41.738476    7465 node.go:419] Unable to send message to peer: 0x2. Error: EOF
W0718 13:15:42.758646    7465 node.go:419] Unable to send message to peer: 0x2. Error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 192.168.0.119:7081: connect: connection refused"
W0718 13:15:48.038569    7465 node.go:419] Unable to send message to peer: 0x3. Error: Unhealthy connection
W0718 13:15:52.778602    7465 node.go:419] Unable to send message to peer: 0x2. Error: Unhealthy connection
W0718 13:15:58.058537    7465 node.go:419] Unable to send message to peer: 0x3. Error: Unhealthy connection
W0718 13:16:02.798560    7465 node.go:419] Unable to send message to peer: 0x2. Error: Unhealthy connection
W0718 13:16:08.078488    7465 node.go:419] Unable to send message to peer: 0x3. Error: Unhealthy connection
W0718 13:16:12.818665    7465 node.go:419] Unable to send message to peer: 0x2. Error: Unhealthy connection
W0718 13:16:18.098548    7465 node.go:419] Unable to send message to peer: 0x3. Error: Unhealthy connection
W0718 13:16:22.838698    7465 node.go:419] Unable to send message to peer: 0x2. Error: Unhealthy connection
W0718 13:16:28.118613    7465 node.go:419] Unable to send message to peer: 0x3. Error: Unhealthy connection
W0718 13:16:32.858608    7465 node.go:419] Unable to send message to peer: 0x2. Error: Unhealthy connection
W0718 13:16:38.138652    7465 node.go:419] Unable to send message to peer: 0x3. Error: Unhealthy connection
W0718 13:16:42.878591    7465 node.go:419] Unable to send message to peer: 0x2. Error: Unhealthy connection
W0718 13:16:48.158601    7465 node.go:419] Unable to send message to peer: 0x3. Error: Unhealthy connection
W0718 13:16:52.898695    7465 node.go:419] Unable to send message to peer: 0x2. Error: Unhealthy connection
E0718 13:20:55.110526    7465 groups.go:853] While proposing delta with MaxAssigned: 10009 and num txns: 0. Error=Server overloaded with pending proposals. Please retry later. Retrying...
W0718 13:20:58.618655    7465 node.go:419] Unable to send message to peer: 0x3. Error: Unhealthy connection
W0718 13:21:03.378642    7465 node.go:419] Unable to send message to peer: 0x2. Error: Unhealthy connection
W0718 13:21:08.638367    7465 node.go:419] Unable to send message to peer: 0x3. Error: Unhealthy connection

Logs when alpha1 and alpha2 was up and alpha3 was down

W0718 13:23:28.918459    7465 node.go:419] Unable to send message to peer: 0x3. Error: Unhealthy connection
W0718 13:23:38.938668    7465 node.go:419] Unable to send message to peer: 0x3. Error: Unhealthy connection
W0718 13:23:48.958617    7465 node.go:419] Unable to send message to peer: 0x3. Error: Unhealthy connection
W0718 13:23:58.978619    7465 node.go:419] Unable to send message to peer: 0x3. Error: Unhealthy connection
W0718 13:24:08.998654    7465 node.go:419] Unable to send message to peer: 0x3. Error: Unhealthy connection
W0718 13:24:19.018619    7465 node.go:419] Unable to send message to peer: 0x3. Error: Unhealthy connection
W0718 13:24:29.038595    7465 node.go:419] Unable to send message to peer: 0x3. Error: Unhealthy connection

amanmangal · July 18, 2019, 8:01pm

Makes sense. As @hackintoshrao said, queries may fail unless a majority of Alphas are up. You could try best effort queries. Best effort queries do not require a majority of replicas to be up.

nshah14285 · July 22, 2019, 9:14am

@amanmangal

So it means minimum 2 replicas has to be up then only data will fetch, right?

Best effort queries do not require a majority of replicas to be up
Do you mean the optimized queries?

amanmangal · July 25, 2019, 7:53am

Best effort queries are optimized to run faster and not necessarily provide the latest result. Given that you only have 3 nodes (and replication is set to 3 too), the queries should return a response.

Topic		Replies	Views
Production instance is taking entire load for cluster Users	8	685	November 21, 2019
All subconns are in TransientFail Dgraph	3	455	August 20, 2020
Dgraph Alpha Node unresponsive Dgraph	10	1101	September 10, 2022
Alpha crashes when loading data Dgraph	7	700	July 1, 2020
Server Overloaded with Pending Proposals Users kind:bug	2	747	June 16, 2019

Data is not loading from main alpha server when other replicas was down

Related topics