Dgraph live load crashing after few min

I am trying live load. after few minutes, the cli is crashing.
any idea? dgraph CPU & memory usage is under the limits.

./dgraph live -a xxxxx:443 -z  yyyyy:8443   -f dgraph_rdf_dob-00000-of-00003.rdf -U "xid"
[Decoder]: Using assembly version of decoder
Page Size: 4096
I0322 01:14:47.489009    4224 init.go:107]

Dgraph version   : v20.11.2
Dgraph codename  : tchalla-2
Dgraph SHA-256   : ccbae03a5c877ba24f501940e4e08d1897f928d0c6579a534148d36749149f9b
Commit SHA-1     : 94f3a0430
Commit timestamp : 2021-02-23 13:07:17 +0530
Branch           : HEAD
Go version       : go1.15.5
jemalloc enabled : false

For Dgraph official documentation, visit https://dgraph.io/docs/.
For discussions about Dgraph     , visit https://discuss.dgraph.io.

Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2020 Dgraph Labs, Inc.



Running transaction with dgraph endpoint: 10.149.144.166:443
Found 1 data file(s) to process
Processing data file "dgraph_rdf_dob-00000-of-00003.rdf"
[01:14:52-0700] Elapsed: 05s Txns: 0 N-Quads: 0 N-Quads/s [last 5s]:     0 Aborts: 0
[01:14:57-0700] Elapsed: 10s Txns: 0 N-Quads: 0 N-Quads/s [last 5s]:     0 Aborts: 0
[01:15:03-0700] Elapsed: 15s Txns: 49 N-Quads: 49000 N-Quads/s [last 5s]:  9800 Aborts: 0

[01:16:22-0700] Elapsed: 01m35s Txns: 249 N-Quads: 249000 N-Quads/s [last 5s]:     0 Aborts: 0
[01:16:27-0700] Elapsed: 01m40s Txns: 249 N-Quads: 249000 N-Quads/s [last 5s]:     0 Aborts: 0
[01:16:32-0700] Elapsed: 01m45s Txns: 249 N-Quads: 249000 N-Quads/s [last 5s]:     0 Aborts: 0
[01:16:37-0700] Elapsed: 01m50s Txns: 249 N-Quads: 249000 N-Quads/s [last 5s]:     0 Aborts: 0
[01:16:42-0700] Elapsed: 01m55s Txns: 249 N-Quads: 249000 N-Quads/s [last 5s]:     0 Aborts: 0
panic: rpc error: code = Unavailable desc = transport is closing

goroutine 210 [running]:
github.com/dgraph-io/dgraph/dgraph/cmd/live.(*loader).upsertUids(0xc0003701e0, 0xc012d6c000, 0x3e8, 0x3e8)
	/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/run.go:339 +0xe08
github.com/dgraph-io/dgraph/dgraph/cmd/live.(*loader).processLoadFile.func1(0xc0002fcc70, 0xc0003701e0, 0xc00002d830)
	/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/run.go:476 +0x3e8
created by github.com/dgraph-io/dgraph/dgraph/cmd/live.(*loader).processLoadFile
	/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/run.go:431 +0xa5
1 Like

The liveload crash is due to a lost connection, these logs don’t help. Check the logs on your cluster. Also, share details about the cluster configs and stats. What are you using? K8s? Docker? there is a load balancer? why it is exposed on 443? are you using TLS? is Ratel running fine with this context?

I have one dgraph-ratel , one dgraph-zero one dgraph-alpha pods on k8s.
I see following waring level message in zero.
"Connection lost with dgraph-alpha-0.dgraph-alpha.lab.svc.cluster.local:7080. Error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp: lookup dgraph-alpha-0.dgraph-alpha.key-perf.svc.cluster.local: no such host""
I am trying live load from laptop (connected via corp VPN to k8s in GKE) may be VPN terminating connection :thinking:

load balancer? yes. dgraph-ratel exposed from GKE ingress on port 443
dgraph-zero exposed on port 8443. all works via ratel UI

There is any pod down? This looks like a crashed Alpha.

Why Ratel? Previously you were using ./dgraph live -a xxxxx:443 which means that you exposed an Alpha at 443, not Ratel. Unless you have exposed it there and Ratel is in a slash(subdirectory) path. And the Alpha in another.

The error from the Live feels like you didn’t really expose the Alpha well. Check this.

here is what we expose via GKE (k8s) ingress to clients:
Ratel → <Ratel_IP>:443
Zero → <Zero_IP>:443/8443
Alph → <Alpha_IP>:443/8443
as I mentioned above, I am running dgraph live command from laptop, I can see around 50k records inserted before CLI crash. I keep rerunning below command repeatedly to keep ingesting more records after each crash.

./dgraph live -a <Alpha_IP>:443  -z  <Zero_IP>:8443   -f dgraph_rdf_dob-00000-of-00003.rdf -U "xid"

How many Alphas do you have?
How many resources are available to the pods?

i have one zero, one alpha. I did not set any limits. it only taking ~2GB during live load tests

Can you share more logs from the crash?

i don’t see any new logs for zero and alpha. here is the CLI logs

dgraph live -a <ip>:443 -z <ip>:8443  -f dgraph_rdf_dob-00001-of-00003.rdf -U "xid"
[Decoder]: Using assembly version of decoder
Page Size: 4096
I0323 13:18:12.466805   18800 init.go:107]

Dgraph version   : v20.11.2
Dgraph codename  : tchalla-2
Dgraph SHA-256   : ccbae03a5c877ba24f501940e4e08d1897f928d0c6579a534148d36749149f9b
Commit SHA-1     : 94f3a0430
Commit timestamp : 2021-02-23 13:07:17 +0530
Branch           : HEAD
Go version       : go1.15.5
jemalloc enabled : false

For Dgraph official documentation, visit https://dgraph.io/docs/.
For discussions about Dgraph     , visit https://discuss.dgraph.io.

Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2020 Dgraph Labs, Inc.



Running transaction with dgraph endpoint:  <ip>:443
Found 1 data file(s) to process
Processing data file "dgraph_rdf_dob-00001-of-00003.rdf"
[13:18:17-0700] Elapsed: 05s Txns: 0 N-Quads: 0 N-Quads/s [last 5s]:     0 Aborts: 0
[13:18:22-0700] Elapsed: 10s Txns: 0 N-Quads: 0 N-Quads/s [last 5s]:     0 Aborts: 0
[13:18:27-0700] Elapsed: 15s Txns: 50 N-Quads: 50000 N-Quads/s [last 5s]: 10000 Aborts: 0
[13:18:32-0700] Elapsed: 20s Txns: 50 N-Quads: 50000 N-Quads/s [last 5s]:     0 Aborts: 0
[13:18:37-0700] Elapsed: 25s Txns: 101 N-Quads: 101000 N-Quads/s [last 5s]: 10200 Aborts: 0
[13:18:42-0700] Elapsed: 30s Txns: 101 N-Quads: 101000 N-Quads/s [last 5s]:     0 Aborts: 0
[13:18:47-0700] Elapsed: 35s Txns: 101 N-Quads: 101000 N-Quads/s [last 5s]:     0 Aborts: 0
[13:18:52-0700] Elapsed: 40s Txns: 101 N-Quads: 101000 N-Quads/s [last 5s]:     0 Aborts: 0
[13:18:57-0700] Elapsed: 45s Txns: 101 N-Quads: 101000 N-Quads/s [last 5s]:     0 Aborts: 0
[13:19:02-0700] Elapsed: 50s Txns: 101 N-Quads: 101000 N-Quads/s [last 5s]:     0 Aborts: 0
[13:19:07-0700] Elapsed: 55s Txns: 101 N-Quads: 101000 N-Quads/s [last 5s]:     0 Aborts: 0
panic: rpc error: code = Unavailable desc = transport is closing

goroutine 212 [running]:
github.com/dgraph-io/dgraph/dgraph/cmd/live.(*loader).upsertUids(0xc0003721e0, 0xc007662000, 0x3e8, 0x3e8)
	/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/run.go:339 +0xe08
github.com/dgraph-io/dgraph/dgraph/cmd/live.(*loader).processLoadFile.func1(0xc000506860, 0xc0003721e0, 0xc00057a600)
	/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/run.go:476 +0x3e8
created by github.com/dgraph-io/dgraph/dgraph/cmd/live.(*loader).processLoadFile
	/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/run.go:431 +0xa5

I can share live , when you have some time help.
Thanks

A “transport is closing” error definitely sounds like the Alpha stopped during the live load. There should be logs for that. Can you share the Alpha logs before and after the restart?

alpha running continuously. here are the lags