Cannot connect to zero from alpha on another host


#1

I’m trying to run a cluster. I have bulk-loaded RDF file with 3 reducers, got directories 0, 1 and 2, copied them to 3 machines (0 to first machine, 1 to second machine, 2 to third machine). My setup is:

192.168.1.109 - zero and first alpha
192.168.1.108 - second alpha
192.168.1.110 - third alpha

I start zero with command:

dgraph zero --my=192.168.1.109:5080

Then the first alpha with command:

dgraph alpha --lru_mb=32768 --my=192.168.1.109:7080 --zero=192.168.1.109:5080 -p path_to_my_data

It seems to be starting fine. Before that I have tested one host and one alpha mode and it used to work.

Now I start second alpha on 192.168.1.108:

dgraph alpha --lru_mb=8192 --my=192.168.1.108:7080 --zero=192.168.1.109:5080 -p path_to_my_data

It says:

I0311 16:06:34.081119 32056 pool.go:136] CONNECTED to 192.168.1.109:5080
I0311 16:06:34.086183 32056 pool.go:136] CONNECTED to localhost:5080
W0311 16:06:34.586706 32056 pool.go:212] Connection lost with localhost:5080. Error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = “transport: Error while dialing dial tcp 127.0.0.1:5080: connect: connection refused”

and is doing nothing after that. Zero starts outputting:

I0311 16:12:30.072292 38917 zero.go:396] Got connection request: cluster_info_only:true
I0311 16:12:30.072725 38917 zero.go:414] Connected: cluster_info_only:true
I0311 16:12:30.174578 38917 zero.go:396] Got connection request: cluster_info_only:true
I0311 16:12:30.174951 38917 zero.go:414] Connected: cluster_info_only:true

continuously. Thousands of such messages. What am I doing wrong?


(Michel Conrado) #2

hmmm, seems that 192.168.1.109:5080 move to localhost:5080

Maybe this is because you didn’t started from scratch. Keep your Bulkload files safe and start from scratch.

If it continues, share the version you’re using and more details. (like docker compose file, config stats so on).


#3

OK, when starting from scratch, it works. But is there a way of using existing data (hundreds of gigabytes) in a cluster?


(Michel Conrado) #4

From where? If it from other DB only exporting this to RDF (Dgraph RDF) or JSON and importing them via Bulk. But I’m confuse by your question. It seems to me to be something different than I understood. Please elaborate.

Cheers.


#5

OK, started cluster. I was providing wrong directories for p parameter. I got 3 groups, but first alpha is serving all predicates and second and third no predicates - why?

I have generated the data in the following way: I have source of data elsewhere. I have written a program which generates RDF file (72GB) with all my nodes and predicates, and schema file. I then used command:

dgraph bulk -r eth.rdf -s eth.schema --map_shards=6 --reduce_shards=3 --zero=localhost:5080

This produced 3 subdirectories in out directory: 0, 1 and 2. Am I correct guessing that 0, 1 and 2 each have different predicates? Then I have copied directory 0 to the first alpha, 1 - to second alpha, and 2 to third alpha. I used 0/p as p parameter for the first alpha and so on. Is it correct way of starting the cluster?


(Michel Conrado) #6

Which is your version?

If you connect with any Alpha you will have all predicates available. The only way to know which Alpha is serving is via Logs or Debug Tool (a new Dgraph tool). But that’s beside the point now. Need to know your version.


#7

This thread says there had been a problem which is fixed in 1.0.12 - https://github.com/dgraph-io/dgraph/issues/2129. Maybe I was using older release. Now I have installed 1.0.13 and re-importing, will see.


#8

After re-importing data with 1.0.13 it is OK, each alpha is serving its portion of predicates. Will try queries and write if anything is wrong.