Loading the 1 million RDF triples test set encounters context deadline exceeded


(jtlz) #1

I was able to load the test dataset with 1 million RDF triplets provided in the Dgraph GitHub, on a local machine with one zero process and one alpha process, and the loading is done via: “dgraph live -r 1million.rdf.gz --zero localhost:5080 -c 1” command provided at: https://tour.dgraph.io/moredata/1/.

Then I set up a HA cluster with 3 VMs forming the “zero” cluster, and another 3 VMs forming the “alpha” cluster. When I tried to load the same dataset of 1million.rdf.gz via the same command (except that now the --zero parameter is specified with one of the zero servers), I encountered the following error message:

“While trying to setup connection to Dgraph alpha. error: context deadline exceeded”

For the same cluster setup, I am able to use “digraph-ratel” and from the web console, to connect to one of the alpha servers, and I can issue query/mutation/schema related operations in the web environment, and see the query results presented on the web console.

So it seems the problem on the graph loader “dgraph live” is about some connection parameter tuning. But I can not find from “dgraph live --help” on the parameter setup that is related to network connection.

I am using dgraph v1.0.11 and go1.11.4 on Ubuntu 16.04.

Could you provide some help on resolving this issue?


(Michel Conrado) #2

Can you provide more details about your HA cluster?
show each command used per step.
How the network is configured?
How many resources each VM has.

This error appears on a local machine or on VM?


(jtlz) #3

The following are the steps that I executed to deploy the “zero” cluster and the “alpha” cluster

(1) deploy the zero cluster

on VM 1: dgraph zero --my=<zero-1-address>:5080 --idx=990 --replicas 3
on VM 2: dgraph zero --my=<zero-2-address>:5080 --peer=<zero-1-address>:5080 --idx=991 --replicas 3
on VM 3: dgraph zero --my=<zero-3-address>:5080 --peer=<zero-1-address>:5080 --idx=993 --replicas 3

at the end, from VM 1 (with zero-1-address), I can see the following lines:

raft_server.go:185] [991] Done joining cluster with err: <nil>
raft_server.go:185] [993] Done joining cluster with err: <nil> 

So I conclude these three VMs are forming a zero cluster, with idx {990, 991, 993}

(2) deploy the alpha cluster

On VM 4:

dgraph alpha --lru_mb=4096 --my=<alpha-1-address>:7080 --zero=<zero-1-address>:5080

Then it displays:

[groups.go:695] Got address of a Zero leader: <zero-1-address>:5080
[groups.go:708] Starting a new membership stream receive from <zero-1-address>:5080.

Correspondingly on the VM 1 (zero-1), it shows:

[zero.go:495] Connected: id:1 group_id:1 addr:"alpha-1-address:7080"  

on the VM 2 (zero-2) and 3 (zero-3), both show:

[pool.go:140] CONNECTED to alpha-1-address:7080

So I conclude that this alpha node 1 has been connected to the zero cluster.

On VM 5:

dgraph alpha --lru_mb=4096 --my=<alpha-2-address>:7080 --zero=<zero-1-address>:5080

then it displays:

[pool.go:140] CONNECTED to zero-1-address:5080
[groups.go:112] Connected to group zero. Assigned group: 1

Correspondingly, in VM1, VM2 and VM3, all show that this alpha-2 has been connected, similar to what is shown for alpha-1.

On VM 6:

dgraph alpha --lru_mb=4096 --my=<alpha-3-address>:7080 --zero=<zero-1-address>:5080

Then it shows:

[ pool.go:140] CONNECTED to 10.148.216.44:5080
[ groups.go:112] Connected to group zero. Assigned group: 1

Correspondingly, in VM1, VM2 and VM3, all show that this alpha-3 has been connected, similar to what is shown for alpha-1.

(3) to launch the batch loader

on VM 7:

dgraph live -r 1million.rdf.gz --zero <zero-1-address>:5080 -c 1

Then I receive the following error message on the VM 7’s shell:

“While trying to setup connection to Dgraph alpha. error: context deadline exceeded”

I also change “–zero” to zero-2-address and zero-3-address, for the command of “dgraph live”. But I received the same error message on deadline exceeded.

In terms of network configuration: VM1, VM2 and VM3 are in one datacenter, and VM4, VM5, VM6, and VM7 are in a different datacenter. Across these two datacenters, the round-trip latency is no more than 20 milliseconds ( I have repeated the above dgraph live commands many times already and the same error shows up each time)

In terms of the VM resources, VM1, VM2 and VM3 each has 2 CPUs and 4 GB RAM. VM4, VM5, VM6, and VM7 each has 4 CPUs and 8 GB RAM.


(Daniel Mai) #4

When running dgraph live you need to provide both the address:port for Zero (-z, default localhost:5080) and Alpha (-d, default localhost:9080).


(Michel Conrado) #5

Yeah, that’s the point I was suspecting. For if you do not provide any value,
Dgraph will use the defaults. That’s why he sees “deadline exceeded”. Dgraph Live is trying to connect and can not because it is pointing to the wrong Alpha.


(jtlz) #6

Thanks for the clarification. I changed the “dgraph live” command to:

dgraph live -r 1million.rdf.gz --dgraph alpha-1-address:9080 --zero zero-1-address:5080 -c 1

it works!