About Dgraph live transport is closing

Valdanito · April 3, 2019, 12:51pm

Hi！
I am trying to import big data(2.5G, about 37million rdfs) into the kubenetes cluster(3 alphas, 3 zeros, 1 ratel). I allocated 80G of memory and 15cpu to each Alpha of the cluster.

dgraph live  -r ./xxx.rdf -d 192.168.31.xxx:9080 -z 192.168.31.xxx:5080

But after an hour, the system reported an error:

52m32s] Txns: 5016 RDFs: 5016000 RDFs/sec:  1591 Aborts: 9
[52m34s] Txns: 5019 RDFs: 5019000 RDFs/sec:  1591 Aborts: 9
2019/04/03 16:56:46 transport is closing
github.com/dgraph-io/dgraph/x.Fatalf
	/ext-go/1/src/github.com/dgraph-io/dgraph/x/error.go:115
github.com/dgraph-io/dgraph/dgraph/cmd/live.handleError
	/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/batch.go:140
github.com/dgraph-io/dgraph/dgraph/cmd/live.(*loader).request
	/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/batch.go:182
github.com/dgraph-io/dgraph/dgraph/cmd/live.(*loader).makeRequests
	/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/batch.go:194
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1333

MichelDiz · April 3, 2019, 3:52pm

Are you running this remotely or inside the pods?

If the answer is the second case, I recommend that you use Dgraph Bulk instead. It’s faster and fit perfect with big dataset.

mrjn · April 4, 2019, 12:44am

This happens if either Dgraph crashed, or your load balancer has interrupted the connection. Most likely it is the second case. If you’re running this on AWS with an elastic load balancer, that would most likely be the case.

Valdanito · April 4, 2019, 1:59am

remotely
and must use live for some reason

Valdanito · April 4, 2019, 2:04am

it is kubenetes load balancer

apiVersion: v1
kind: Service
metadata:
  name: dgraph-zero-public
  labels:
    app: dgraph-zero
spec:
  type: LoadBalancer
  ports:
  - port: 5080
    targetPort: 5080
    name: zero-grpc
  - port: 6080
    targetPort: 6080
    name: zero-http
  selector:
    app: dgraph-zero
---
apiVersion: v1
kind: Service
metadata:
  name: dgraph-alpha-public
  labels:
    app: dgraph-alpha
spec:
  type: LoadBalancer
  ports:
  - port: 8080
    targetPort: 8080
    name: alpha-http
  - port: 9080
    targetPort: 9080
    name: alpha-grpc
  selector:
    app: dgraph-alpha
---
apiVersion: v1
kind: Service
metadata:
  name: dgraph-ratel-public
  labels:
    app: dgraph-ratel
spec:
  type: LoadBalancer
  ports:
  - port: 8000
    targetPort: 8000
    name: ratel-http
  selector:
    app: dgraph-ratel
---
# This is a headless service which is necessary for discovery for a dgraph-zero StatefulSet.
# https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#creating-a-statefulset
apiVersion: v1
kind: Service
metadata:
  name: dgraph-zero
  labels:
    app: dgraph-zero
spec:
  ports:
  - port: 5080
    targetPort: 5080
    name: zero-grpc
  clusterIP: None
  selector:
    app: dgraph-zero
---
# This is a headless service which is necessary for discovery for a dgraph-alpha StatefulSet.
# https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#creating-a-statefulset
apiVersion: v1
kind: Service
metadata:
  name: dgraph-alpha
  labels:
    app: dgraph-alpha
spec:
  ports:
  - port: 7080
    targetPort: 7080
    name: alpha-grpc-int
  clusterIP: None
  selector:
    app: dgraph-alpha
---
# This StatefulSet runs 3 Dgraph Zero.
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: dgraph-zero
spec:
  serviceName: "dgraph-zero"
  replicas: 3
  selector:
    matchLabels:
      app: dgraph-zero
  template:
    metadata:
      labels:
        app: dgraph-zero
      annotations:
        prometheus.io/scrape: 'true'
        prometheus.io/path: '/debug/prometheus_metrics'
        prometheus.io/port: '6080'
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - dgraph-zero
              topologyKey: kubernetes.io/hostname
      containers:
      - name: zero
        image: dgraph/dgraph:v1.0.13
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 5080
          name: zero-grpc
        - containerPort: 6080
          name: zero-http
        volumeMounts:
        - name: datadir
          mountPath: /dgraph
        env:
          - name: POD_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
        command:
          - bash
          - "-c"
          - |
            set -ex
            [[ `hostname` =~ -([0-9]+)$ ]] || exit 1
            ordinal=${BASH_REMATCH[1]}
            idx=$(($ordinal + 1))
            if [[ $ordinal -eq 0 ]]; then
              dgraph zero --my=$(hostname -f):5080 --idx $idx --replicas 3
            else
              dgraph zero --my=$(hostname -f):5080 --peer dgraph-zero-0.dgraph-zero.${POD_NAMESPACE}.svc.cluster.local:5080 --idx $idx --replicas 3
            fi
      terminationGracePeriodSeconds: 30
      volumes:
      - name: datadir
        persistentVolumeClaim:
          claimName: datadir
  updateStrategy:
    type: RollingUpdate
  volumeClaimTemplates:
  - metadata:
      name: datadir
      annotations:
        volume.alpha.kubernetes.io/storage-class: anything
    spec:
      accessModes:
        - "ReadWriteOnce"
      resources:
        requests:
          storage: 5Gi
---
# This StatefulSet runs 3 replicas of Dgraph Alpha.
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: dgraph-alpha
spec:
  serviceName: "dgraph-alpha"
  replicas: 3
  selector:
    matchLabels:
      app: dgraph-alpha
  template:
    metadata:
      labels:
        app: dgraph-alpha
      annotations:
        prometheus.io/scrape: 'true'
        prometheus.io/path: '/debug/prometheus_metrics'
        prometheus.io/port: '8080'
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - dgraph-alpha
              topologyKey: kubernetes.io/hostname
      # Initializing the Alphas:
      #
      # You may want to initialize the Alphas with data before starting, e.g.
      # with data from the Dgraph Bulk Loader: https://docs.dgraph.io/deploy/#bulk-loader.
      # You can accomplish by uncommenting this initContainers config. This
      # starts a container with the same /dgraph volume used by Alpha and runs
      # before Alpha starts.
      #
      # You can copy your local p directory to the pod's /dgraph/p directory
      # with this command:
      #
      #    kubectl cp path/to/p dgraph-alpha-0:/dgraph/ -c init-alpha
      #    (repeat for each alpha pod)
      #
      # When you're finished initializing each Alpha data directory, you can signal
      # it to terminate successfully by creating a /dgraph/doneinit file:
      #
      #    kubectl exec dgraph-alpha-0 -c init-alpha touch /dgraph/doneinit
      #
      # Note that pod restarts cause re-execution of Init Containers. Since
      # /dgraph is persisted across pod restarts, the Init Container will exit
      # automatically when /dgraph/doneinit is present and proceed with starting
      # the Alpha process.
      #
      # Tip: StatefulSet pods can start in parallel by configuring
      # .spec.podManagementPolicy to Parallel:
      #
      #     https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#deployment-and-scaling-guarantees
      #
      initContainers:
        - name: init-alpha
          image: dgraph/dgraph:master
          command:
            - bash
            - "-c"
            - |
              echo "Write to /dgraph/doneinit when ready."
              until [ -f /dgraph/doneinit ]; do sleep 2; done
          volumeMounts:
            - name: datadir
              mountPath: /dgraph
      containers:
      - name: alpha
        image: dgraph/dgraph:v1.0.13
        imagePullPolicy: IfNotPresent
        resources:
          limits:
            cpu: "20"
            memory: 40Gi
          requests:
            cpu: "10"
            memory: 10Gi
        ports:
        - containerPort: 7080
          name: alpha-grpc-int
        - containerPort: 8080
          name: alpha-http
        - containerPort: 9080
          name: alpha-grpc
        volumeMounts:
        - name: datadir
          mountPath: /dgraph
        env:
          # This should be the same namespace as the dgraph-zero
          # StatefulSet to resolve a Dgraph Zero's DNS name for
          # Alpha's --zero flag.
          - name: POD_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
        command:
          - bash
          - "-c"
          - |
            set -ex
            dgraph alpha --my=$(hostname -f):7080 --lru_mb 40960 --zero ${DGRAPH_ZERO_PUBLIC_PORT_5080_TCP_ADDR}:5080
      terminationGracePeriodSeconds: 30
      volumes:
      - name: datadir
        persistentVolumeClaim:
          claimName: datadir
  updateStrategy:
    type: RollingUpdate
  volumeClaimTemplates:
  - metadata:
      name: datadir
      annotations:
        volume.alpha.kubernetes.io/storage-class: anything
    spec:
      accessModes:
        - "ReadWriteOnce"
      resources:
        requests:
          storage: 5Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: dgraph-ratel
  labels:
    app: dgraph-ratel
spec:
  selector:
    matchLabels:
      app: dgraph-ratel
  template:
    metadata:
      labels:
        app: dgraph-ratel
    spec:
      containers:
      - name: ratel
        image: dgraph/dgraph:v1.0.13
        ports:
        - containerPort: 8000
        command:
          - dgraph-ratel

dmai · April 4, 2019, 6:46pm

You’d need to check how the underlying load balancer is configured. On AWS a Kubernetes LoadBalancer is backed by an Elastic Load Balancer.

bruno-unna · April 9, 2019, 1:41pm

My 2¢: I’m running Dgraph a local cluster, controlled by a Nomad job (1 zero, 4 alpha, 1 ratel). None of the alphas is really big (7 GB, with lru_mb=2389) I’m trying to load the 21 million movie database:

% docker exec -ti alpha-a6172fdf-d72a-d8c9-d3f0-7b8951442caf dgraph live -r /tmp/21million.rdf.gz --zero 192.168.168.158:28084 -c 1
I0409 09:45:54.970927 299 init.go:88]

Dgraph version : v1.0.13
Commit SHA-1 : 691b3b35
Commit timestamp : 2019-03-09 19:33:59 -0800
Branch : HEAD
Go version : go1.11.5

For Dgraph official documentation, visit https://docs.dgraph.io.
For discussions about Dgraph , visit http://discuss.dgraph.io.
To say hi to the community , visit https://dgraph.slack.com.

Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2018 Dgraph Labs, Inc.

Creating temp client directory at /tmp/x295687544
badger 2019/04/09 09:45:54 INFO: All 0 tables opened in 0s

Processing /tmp/21million.rdf.gz
[ 2s] Txns: 31 RDFs: 31000 RDFs/sec: 15499 Aborts: 0
[ 4s] Txns: 73 RDFs: 73000 RDFs/sec: 18249 Aborts: 0

…

[ 2m58s] Txns: 1381 RDFs: 1381000 RDFs/sec: 7758 Aborts: 0
[ 3m0s] Txns: 1390 RDFs: 1390000 RDFs/sec: 7722 Aborts: 0
[ 3m2s] Txns: 1392 RDFs: 1392000 RDFs/sec: 7648 Aborts: 7
[ 3m4s] Txns: 1392 RDFs: 1392000 RDFs/sec: 7565 Aborts: 18
[ 3m6s] Txns: 1392 RDFs: 1392000 RDFs/sec: 7484 Aborts: 31
[ 3m8s] Txns: 1392 RDFs: 1392000 RDFs/sec: 7404 Aborts: 40
[ 3m10s] Txns: 1392 RDFs: 1392000 RDFs/sec: 7326 Aborts: 50
[ 3m12s] Txns: 1403 RDFs: 1403000 RDFs/sec: 7307 Aborts: 53
[ 3m14s] Txns: 1405 RDFs: 1405000 RDFs/sec: 7242 Aborts: 53
[ 3m16s] Txns: 1414 RDFs: 1414000 RDFs/sec: 7214 Aborts: 53
[ 3m18s] Txns: 1425 RDFs: 1425000 RDFs/sec: 7197 Aborts: 53
[ 3m20s] Txns: 1434 RDFs: 1434000 RDFs/sec: 7170 Aborts: 53
[ 3m22s] Txns: 1444 RDFs: 1444000 RDFs/sec: 7149 Aborts: 53
[ 3m24s] Txns: 1453 RDFs: 1453000 RDFs/sec: 7123 Aborts: 53
[ 3m26s] Txns: 1462 RDFs: 1462000 RDFs/sec: 7097 Aborts: 53
[ 3m28s] Txns: 1472 RDFs: 1472000 RDFs/sec: 7077 Aborts: 53
[ 3m30s] Txns: 1482 RDFs: 1482000 RDFs/sec: 7057 Aborts: 53
[ 3m32s] Txns: 1491 RDFs: 1491000 RDFs/sec: 7033 Aborts: 53
2019/04/09 09:49:27 transport is closing
github.com/dgraph-io/dgraph/x.Fatalf
/ext-go/1/src/github.com/dgraph-io/dgraph/x/error.go:115
github.com/dgraph-io/dgraph/dgraph/cmd/live.handleError
/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/batch.go:132
github.com/dgraph-io/dgraph/dgraph/cmd/live.(*loader).request
/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/batch.go:174
github.com/dgraph-io/dgraph/dgraph/cmd/live.(*loader).makeRequests
/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/batch.go:186
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1333

The end result is the same as @Valdanito 's but I don’t see any sign of the load-balancer having interrupted the connection (local cluster, very responsive, logs say nothing about it).

Any hints into what I should be looking for? Thanks in advance.

Update: I’ve just noticed that no nodes were inserted, and that the schema I provided just before attempting the import (21million.schema) has vanished as well.

Valdanito · April 10, 2019, 6:17am

Check your memory config, Includes memory limits for the container and Alpha --lru_mb.
In the process of importing data, Dgraph seems to run out of all the memory you allocated (I am not sure). You need to leave some memory for the system to prevent OOM kill.
@ bruno-unna
The above is my solution to this problem.
But I still want to know the official recommended memory settings.
@ Daniel Mai

amaster507 · June 26, 2020, 8:56pm

I am having this same exact problem!

I am trying to load 6.8 Mil records which is a 391Mb .rdf file.

I do not have a load balancer in the mix, and am now running with 8Gb Ram on an t3.large instance on AWS.

Tried watching the syslog on the server and it does not show any errors on that side.

ibrahim · June 27, 2020, 7:36am

Tried watching the syslog on the server and it does not show any errors on that side.

Are you sure alpha/zero was running? The error shows up when a connection is lost because the server crashed (any kind of error could cause this)

amaster507 · June 28, 2020, 7:55pm

After further review and watching docker stats. Alpha uses up all available RAM (8Gb available between zero and alpha) and alpha crashes due to OOM. Is there any kind of memory limit that will allow live loader to work with a smaller memory limit and continue to process but possibly take longer to process. I just need to complete the task period and I am limited to 8Gb. If this is a deal breaker, then is there a way to split up live loads using the same zero nodes variables?

MichelDiz · June 28, 2020, 11:24pm

@ibrahim some time ago, I have thought about giving a “cooldown” (via flag) to liveloader to reduce the stress on the Alphas and Zeros - giving time to the GC do its work. This would be nice for users who have low resources available. Maybe add it to bulkloader too. What do you think?

gumupaier · June 29, 2020, 3:10am

That’s what I want, too

MichelDiz · June 29, 2020, 3:16am

So, please, one of you guys (@gumupaier, @amaster507) fill up an issue requesting a “cooldown” approach for liveloader. And maybe another ticket for Bulkloader. And please, follow the issue template, give a concise idea of the issue, and how a “cooldown” would help you.

Also, after the issue created share all possible use cases you can think of, in comments. If it gets popular, it can gain priority.

Cheers.

amaster507 · June 29, 2020, 3:24am

On github?

MichelDiz · June 29, 2020, 3:45am

Yes sir.

amaster507 · June 29, 2020, 3:54am

@MichelDiz, thank you for the replies and direction. I will fill out an issue for the live loader in the morning (CDT here) as that was more my case. I love the responsiveness and support by the way!

gumupaier · June 29, 2020, 5:15am

@MichelDiz thank you for the replies.I will try my best to give a supplementary description of the situation. My English is a little poor, which may be a little strange.

Topic		Replies	Views
Fatal error when live loading data Dgraph	7	1858	December 20, 2018
Live Loder: Error while mutating While proposing error: raft proposal dropped Dgraph mutation	3	877	November 3, 2020
Dgraph crashed during live loading using dgraph live and unable to start the db Dgraph	12	760	February 24, 2019
Memory use and crashes when live loading Dgraph	11	738	November 3, 2020
Dose it have time or size limit to upload data via dgraph live? Users	3	638	February 22, 2018

About Dgraph live transport is closing

Related Topics