I’m trying to run a cluster. I have bulk-loaded RDF file with 3 reducers, got directories 0, 1 and 2, copied them to 3 machines (0 to first machine, 1 to second machine, 2 to third machine). My setup is:
192.168.1.109 - zero and first alpha
192.168.1.108 - second alpha
192.168.1.110 - third alpha
I start zero with command:
dgraph zero --my=192.168.1.109:5080
Then the first alpha with command:
dgraph alpha --lru_mb=32768 --my=192.168.1.109:7080 --zero=192.168.1.109:5080 -p path_to_my_data
It seems to be starting fine. Before that I have tested one host and one alpha mode and it used to work.
Now I start second alpha on 192.168.1.108:
dgraph alpha --lru_mb=8192 --my=192.168.1.108:7080 --zero=192.168.1.109:5080 -p path_to_my_data
It says:
I0311 16:06:34.081119 32056 pool.go:136] CONNECTED to 192.168.1.109:5080
I0311 16:06:34.086183 32056 pool.go:136] CONNECTED to localhost:5080
W0311 16:06:34.586706 32056 pool.go:212] Connection lost with localhost:5080. Error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = “transport: Error while dialing dial tcp 127.0.0.1:5080: connect: connection refused”
and is doing nothing after that. Zero starts outputting:
I0311 16:12:30.072292 38917 zero.go:396] Got connection request: cluster_info_only:true
I0311 16:12:30.072725 38917 zero.go:414] Connected: cluster_info_only:true
I0311 16:12:30.174578 38917 zero.go:396] Got connection request: cluster_info_only:true
I0311 16:12:30.174951 38917 zero.go:414] Connected: cluster_info_only:true
continuously. Thousands of such messages. What am I doing wrong?