Crash replicas and recover


(vladimir) #1

Hello.
I’m new in Dgraph
I use Dgraph v1.0.7-rc4 release fo Windows10

for experiments i make 1 instance Dgraph Zero and 2 Dgraph server replicas

dgraph zero --my=localhost:5080 --replicas 3 -w zeroCluster

dgraph server --my=localhost:7080 --lru_mb=3512 --zero=localhost:5080 -o 0 -p server0/p -w server0/w
dgraph server --my=localhost:7081 --lru_mb=3512 --zero=localhost:5080 -o 1 -p server1/p -w server1/w
dgraph server --my=localhost:7082 --lru_mb=3512 --zero=localhost:5080 -o 2 -p server2/p -w server2/w

If I destroy instance of replicas for 2 dgraph servers(ctrl+c many time or shutdown terminal), for example localhost:7081 and localhost:7080 I can’t get access to data and localhost:7082 show errors.
How can I correct this situation, recover data and cluster schema?


(Manish R Jain) #2

@MichelDiz can help more, but you must have 2 out of 3 replicas up and running for data application and retrieval. Dgraph does consistent replication via Raft, which enforces this stipulation.


(Michel Conrado (Support Engineer)) #3

Vladimir, Can you paste here the logs you get?

What is the content of your experiment? Did you use LIVE or Bulk? or manually entered data via mutation?

How long did it take between you to load the data and run the shutdown? (just to know if the servers have had time to sync) did you see logs like “moving predicate A”? and waited it to finish?

When you force a shutdown or there is a sudden shutdown. The Dgraph leaves a Lock in the folders that were being used. Before use it is necessary to remove them.

After recovering the server instances did you get errors? paste logs pls.

Cheers.


(vladimir) #4

Can you help me, how I can get logs when I start dgraph server on Windows 10?

I use mutation from tour:
https://tour.dgraph.io/intro/3/
https://tour.dgraph.io/intro/4/
https://tour.dgraph.io/intro/5/

When I destroy the first replica. It’s ok. Then a new leader is chosen.
It’s two replicas on. If I destroyed leader. I got mistakes.
But if leader on last server, it’s ok too.
Then I destroy all.
After that, I try this algorithm:

I deleting Lock from zeroCluster and start zero server
dgraph zero --my=localhost:5080 --replicas 3 -w zeroCluster

I deleting Lock from server0 p&w and start server0
dgraph server --my=localhost:7080 --lru_mb=3512 --zero=localhost:5080 -o 0 -p server0/p -w server0/w

I’m deleting Lock from server1 p&w and start server1
dgraph server --my=localhost:7081 --lru_mb=3512 --zero=localhost:5080 -o 1 -p server1/p -w server1/w

I’m deleting Lock from server2 p&w and start server2
dgraph server --my=localhost:7082 --lru_mb=3512 --zero=localhost:5080 -o 2 -p server2/p -w server2/w

And all ok ))

Can I manually change a leader from one to another host?


(Michel Conrado (Support Engineer)) #5

About saving the logs on cmd https://www.labnol.org/software/copy-command-output-to-clipboard/2506/

I believe you can do
dgraph zero --my=localhost:5080 --replicas 3 -w zeroCluster >C:\zeroCluster_output.txt

I’m not sure if Microsoft gives support for pipes (but try):

dgraph cmmand cmmand |& tee -a zeroCluster_output.txt

But you can use the “bash” from github install.

/removeNode?id=3&group=2 If a replica goes down and can’t be recovered, you can remove it and add a new node to the quorum. This endpoint can be used to remove a dead Zero or Dgraph server node. To remove dead Zero nodes, just pass group=0 and the id of the Zero node.

Take a read https://docs.dgraph.io/deploy/#more-about-dgraph-zero

Generally, the servers are in leader or follower state. When the leader crashes or the communication breaks down, the followers will wait for election timeout before converting to candidates. The election timeout is randomized. This would allow one of them to declare candidacy before others. The candidate would vote for itself and wait for the majority of the cluster to vote for it as well. If a follower hears from a candidate with a higher term than the current (dead in this case) leader, it would vote for it. The candidate who gets majority votes wins the election and becomes the leader.

more: https://docs.dgraph.io/design-concepts/#server-states


(vladimir) #6

I believe you can do
dgraph zero --my=localhost:5080 --replicas 3 -w zeroCluster >C:\zeroCluster_output.txt

It’s work. Thanks.


(vladimir) #7

I can safe close servers with
/admin/shutdown
How I can safe close zero server. Or kill pid is safe?