sumanth
(Sumanth Chinthagunta)
March 22, 2021, 8:26am
1
I am trying live load. after few minutes, the cli is crashing.
any idea? dgraph CPU & memory usage is under the limits.
./dgraph live -a xxxxx:443 -z yyyyy:8443 -f dgraph_rdf_dob-00000-of-00003.rdf -U "xid"
[Decoder]: Using assembly version of decoder
Page Size: 4096
I0322 01:14:47.489009 4224 init.go:107]
Dgraph version : v20.11.2
Dgraph codename : tchalla-2
Dgraph SHA-256 : ccbae03a5c877ba24f501940e4e08d1897f928d0c6579a534148d36749149f9b
Commit SHA-1 : 94f3a0430
Commit timestamp : 2021-02-23 13:07:17 +0530
Branch : HEAD
Go version : go1.15.5
jemalloc enabled : false
For Dgraph official documentation, visit https://dgraph.io/docs/.
For discussions about Dgraph , visit http://discuss.dgraph.io.
Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2020 Dgraph Labs, Inc.
Running transaction with dgraph endpoint: 10.149.144.166:443
Found 1 data file(s) to process
Processing data file "dgraph_rdf_dob-00000-of-00003.rdf"
[01:14:52-0700] Elapsed: 05s Txns: 0 N-Quads: 0 N-Quads/s [last 5s]: 0 Aborts: 0
[01:14:57-0700] Elapsed: 10s Txns: 0 N-Quads: 0 N-Quads/s [last 5s]: 0 Aborts: 0
[01:15:03-0700] Elapsed: 15s Txns: 49 N-Quads: 49000 N-Quads/s [last 5s]: 9800 Aborts: 0
[01:16:22-0700] Elapsed: 01m35s Txns: 249 N-Quads: 249000 N-Quads/s [last 5s]: 0 Aborts: 0
[01:16:27-0700] Elapsed: 01m40s Txns: 249 N-Quads: 249000 N-Quads/s [last 5s]: 0 Aborts: 0
[01:16:32-0700] Elapsed: 01m45s Txns: 249 N-Quads: 249000 N-Quads/s [last 5s]: 0 Aborts: 0
[01:16:37-0700] Elapsed: 01m50s Txns: 249 N-Quads: 249000 N-Quads/s [last 5s]: 0 Aborts: 0
[01:16:42-0700] Elapsed: 01m55s Txns: 249 N-Quads: 249000 N-Quads/s [last 5s]: 0 Aborts: 0
panic: rpc error: code = Unavailable desc = transport is closing
goroutine 210 [running]:
github.com/dgraph-io/dgraph/dgraph/cmd/live.(*loader).upsertUids(0xc0003701e0, 0xc012d6c000, 0x3e8, 0x3e8)
/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/run.go:339 +0xe08
github.com/dgraph-io/dgraph/dgraph/cmd/live.(*loader).processLoadFile.func1(0xc0002fcc70, 0xc0003701e0, 0xc00002d830)
/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/run.go:476 +0x3e8
created by github.com/dgraph-io/dgraph/dgraph/cmd/live.(*loader).processLoadFile
/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/run.go:431 +0xa5
1 Like
MichelDiz
(Michel Diz)
March 22, 2021, 12:47pm
2
The liveload crash is due to a lost connection, these logs don’t help. Check the logs on your cluster. Also, share details about the cluster configs and stats. What are you using? K8s? Docker? there is a load balancer? why it is exposed on 443? are you using TLS? is Ratel running fine with this context?
sumanth
(Sumanth Chinthagunta)
March 22, 2021, 9:55pm
3
I have one dgraph-ratel , one dgraph-zero one dgraph-alpha pods on k8s.
I see following waring level message in zero.
"Connection lost with dgraph-alpha-0.dgraph-alpha.lab.svc.cluster.local:7080. Error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: lookup dgraph-alpha-0.dgraph-alpha.key-perf.svc.cluster.local: no such host""
I am trying live load from laptop (connected via corp VPN to k8s in GKE) may be VPN terminating connection
sumanth
(Sumanth Chinthagunta)
March 22, 2021, 9:58pm
4
load balancer? yes. dgraph-ratel exposed from GKE ingress on port 443
dgraph-zero exposed on port 8443. all works via ratel UI
MichelDiz
(Michel Diz)
March 22, 2021, 10:06pm
5
sumanth:
“Connection lost with dgraph-alpha-0.dgraph-alpha.lab.svc.cluster.local:7080. Error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = “transport: Error while dialing dial tcp: lookup dgraph-alpha-0.dgraph-alpha.key-perf.svc.cluster.local: no such host””
There is any pod down? This looks like a crashed Alpha.
Why Ratel? Previously you were using ./dgraph live -a xxxxx:443
which means that you exposed an Alpha at 443, not Ratel. Unless you have exposed it there and Ratel is in a slash(subdirectory) path. And the Alpha in another.
The error from the Live feels like you didn’t really expose the Alpha well. Check this.
sumanth
(Sumanth Chinthagunta)
March 22, 2021, 11:43pm
6
here is what we expose via GKE (k8s) ingress to clients:
Ratel → <Ratel_IP>:443
Zero → <Zero_IP>:443/8443
Alph → <Alpha_IP>:443/8443
as I mentioned above, I am running dgraph live
command from laptop, I can see around 50k records inserted before CLI crash. I keep rerunning below command repeatedly to keep ingesting more records after each crash.
./dgraph live -a <Alpha_IP>:443 -z <Zero_IP>:8443 -f dgraph_rdf_dob-00000-of-00003.rdf -U "xid"
MichelDiz
(Michel Diz)
March 23, 2021, 12:25am
7
How many Alphas do you have?
How many resources are available to the pods?
sumanth
(Sumanth Chinthagunta)
March 23, 2021, 12:53am
8
i have one zero, one alpha. I did not set any limits. it only taking ~2GB during live load tests
mrjn
(Manish R Jain)
March 23, 2021, 2:51am
11
Can you share more logs from the crash?
sumanth
(Sumanth Chinthagunta)
March 23, 2021, 8:23pm
12
i don’t see any new logs for zero and alpha. here is the CLI logs
dgraph live -a <ip>:443 -z <ip>:8443 -f dgraph_rdf_dob-00001-of-00003.rdf -U "xid"
[Decoder]: Using assembly version of decoder
Page Size: 4096
I0323 13:18:12.466805 18800 init.go:107]
Dgraph version : v20.11.2
Dgraph codename : tchalla-2
Dgraph SHA-256 : ccbae03a5c877ba24f501940e4e08d1897f928d0c6579a534148d36749149f9b
Commit SHA-1 : 94f3a0430
Commit timestamp : 2021-02-23 13:07:17 +0530
Branch : HEAD
Go version : go1.15.5
jemalloc enabled : false
For Dgraph official documentation, visit https://dgraph.io/docs/.
For discussions about Dgraph , visit http://discuss.dgraph.io.
Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2020 Dgraph Labs, Inc.
Running transaction with dgraph endpoint: <ip>:443
Found 1 data file(s) to process
Processing data file "dgraph_rdf_dob-00001-of-00003.rdf"
[13:18:17-0700] Elapsed: 05s Txns: 0 N-Quads: 0 N-Quads/s [last 5s]: 0 Aborts: 0
[13:18:22-0700] Elapsed: 10s Txns: 0 N-Quads: 0 N-Quads/s [last 5s]: 0 Aborts: 0
[13:18:27-0700] Elapsed: 15s Txns: 50 N-Quads: 50000 N-Quads/s [last 5s]: 10000 Aborts: 0
[13:18:32-0700] Elapsed: 20s Txns: 50 N-Quads: 50000 N-Quads/s [last 5s]: 0 Aborts: 0
[13:18:37-0700] Elapsed: 25s Txns: 101 N-Quads: 101000 N-Quads/s [last 5s]: 10200 Aborts: 0
[13:18:42-0700] Elapsed: 30s Txns: 101 N-Quads: 101000 N-Quads/s [last 5s]: 0 Aborts: 0
[13:18:47-0700] Elapsed: 35s Txns: 101 N-Quads: 101000 N-Quads/s [last 5s]: 0 Aborts: 0
[13:18:52-0700] Elapsed: 40s Txns: 101 N-Quads: 101000 N-Quads/s [last 5s]: 0 Aborts: 0
[13:18:57-0700] Elapsed: 45s Txns: 101 N-Quads: 101000 N-Quads/s [last 5s]: 0 Aborts: 0
[13:19:02-0700] Elapsed: 50s Txns: 101 N-Quads: 101000 N-Quads/s [last 5s]: 0 Aborts: 0
[13:19:07-0700] Elapsed: 55s Txns: 101 N-Quads: 101000 N-Quads/s [last 5s]: 0 Aborts: 0
panic: rpc error: code = Unavailable desc = transport is closing
goroutine 212 [running]:
github.com/dgraph-io/dgraph/dgraph/cmd/live.(*loader).upsertUids(0xc0003721e0, 0xc007662000, 0x3e8, 0x3e8)
/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/run.go:339 +0xe08
github.com/dgraph-io/dgraph/dgraph/cmd/live.(*loader).processLoadFile.func1(0xc000506860, 0xc0003721e0, 0xc00057a600)
/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/run.go:476 +0x3e8
created by github.com/dgraph-io/dgraph/dgraph/cmd/live.(*loader).processLoadFile
/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/run.go:431 +0xa5
I can share live , when you have some time help.
Thanks
dmai
(Daniel Mai)
March 23, 2021, 9:26pm
13
A “transport is closing” error definitely sounds like the Alpha stopped during the live load. There should be logs for that. Can you share the Alpha logs before and after the restart?
sumanth
(Sumanth Chinthagunta)
March 23, 2021, 9:34pm
14
alpha running continuously. here are the lags