Cannot connect to zero from alpha on another host

dimaa6 · March 11, 2019, 3:22pm

I’m trying to run a cluster. I have bulk-loaded RDF file with 3 reducers, got directories 0, 1 and 2, copied them to 3 machines (0 to first machine, 1 to second machine, 2 to third machine). My setup is:

192.168.1.109 - zero and first alpha
192.168.1.108 - second alpha
192.168.1.110 - third alpha

I start zero with command:

dgraph zero --my=192.168.1.109:5080

Then the first alpha with command:

dgraph alpha --lru_mb=32768 --my=192.168.1.109:7080 --zero=192.168.1.109:5080 -p path_to_my_data

It seems to be starting fine. Before that I have tested one host and one alpha mode and it used to work.

Now I start second alpha on 192.168.1.108:

dgraph alpha --lru_mb=8192 --my=192.168.1.108:7080 --zero=192.168.1.109:5080 -p path_to_my_data

It says:

I0311 16:06:34.081119 32056 pool.go:136] CONNECTED to 192.168.1.109:5080
I0311 16:06:34.086183 32056 pool.go:136] CONNECTED to localhost:5080
W0311 16:06:34.586706 32056 pool.go:212] Connection lost with localhost:5080. Error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = “transport: Error while dialing dial tcp 127.0.0.1:5080: connect: connection refused”

and is doing nothing after that. Zero starts outputting:

I0311 16:12:30.072292 38917 zero.go:396] Got connection request: cluster_info_only:true
I0311 16:12:30.072725 38917 zero.go:414] Connected: cluster_info_only:true
I0311 16:12:30.174578 38917 zero.go:396] Got connection request: cluster_info_only:true
I0311 16:12:30.174951 38917 zero.go:414] Connected: cluster_info_only:true

continuously. Thousands of such messages. What am I doing wrong?

MichelDiz · March 11, 2019, 3:29pm

hmmm, seems that 192.168.1.109:5080 move to localhost:5080

Maybe this is because you didn’t started from scratch. Keep your Bulkload files safe and start from scratch.

If it continues, share the version you’re using and more details. (like docker compose file, config stats so on).

update 2020: Pawan Rawal - “The my address for the first zero and alpha should have their public IP instead of localhost”.

dimaa6 · March 11, 2019, 3:41pm

OK, when starting from scratch, it works. But is there a way of using existing data (hundreds of gigabytes) in a cluster?

MichelDiz · March 11, 2019, 3:55pm

From where? If it from other DB only exporting this to RDF (Dgraph RDF) or JSON and importing them via Bulk. But I’m confuse by your question. It seems to me to be something different than I understood. Please elaborate.

Cheers.

dimaa6 · March 11, 2019, 3:59pm

OK, started cluster. I was providing wrong directories for p parameter. I got 3 groups, but first alpha is serving all predicates and second and third no predicates - why?

I have generated the data in the following way: I have source of data elsewhere. I have written a program which generates RDF file (72GB) with all my nodes and predicates, and schema file. I then used command:

dgraph bulk -r eth.rdf -s eth.schema --map_shards=6 --reduce_shards=3 --zero=localhost:5080

This produced 3 subdirectories in out directory: 0, 1 and 2. Am I correct guessing that 0, 1 and 2 each have different predicates? Then I have copied directory 0 to the first alpha, 1 - to second alpha, and 2 to third alpha. I used 0/p as p parameter for the first alpha and so on. Is it correct way of starting the cluster?

MichelDiz · March 11, 2019, 4:20pm

Which is your version?

If you connect with any Alpha you will have all predicates available. The only way to know which Alpha is serving is via Logs or Debug Tool (a new Dgraph tool). But that’s beside the point now. Need to know your version.

dimaa6 · March 11, 2019, 4:21pm

This thread says there had been a problem which is fixed in 1.0.12 - Data missing in Dgraph cluster after bulk loading · Issue #2129 · dgraph-io/dgraph · GitHub. Maybe I was using older release. Now I have installed 1.0.13 and re-importing, will see.

dimaa6 · March 12, 2019, 7:32am

After re-importing data with 1.0.13 it is OK, each alpha is serving its portion of predicates. Will try queries and write if anything is wrong.

system · April 11, 2019, 7:32am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Why does alpha still connect to localshost:5080 after specifying the zero parameter? Dgraph kind:question	1	674	November 27, 2020
Alpha can't connect with zero. rpc error Users	6	1431	February 19, 2020
Dgraph cluster setup Dgraph dgraph , cluster , docker	2	406	March 9, 2023
All subconns are in TransientFail Dgraph	3	411	August 20, 2020
Problem is setting up multinode cluster Users	4	705	June 1, 2019

Cannot connect to zero from alpha on another host

Related Topics