Valdanito
(Valdanito)
April 3, 2019, 12:51pm
1
Hi!
I am trying to import big data(2.5G, about 37million rdfs) into the kubenetes cluster(3 alphas, 3 zeros, 1 ratel). I allocated 80G of memory and 15cpu to each Alpha of the cluster.
dgraph live -r ./xxx.rdf -d 192.168.31.xxx:9080 -z 192.168.31.xxx:5080
But after an hour, the system reported an error:
52m32s] Txns: 5016 RDFs: 5016000 RDFs/sec: 1591 Aborts: 9
[52m34s] Txns: 5019 RDFs: 5019000 RDFs/sec: 1591 Aborts: 9
2019/04/03 16:56:46 transport is closing
github.com/dgraph-io/dgraph/x.Fatalf
/ext-go/1/src/github.com/dgraph-io/dgraph/x/error.go:115
github.com/dgraph-io/dgraph/dgraph/cmd/live.handleError
/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/batch.go:140
github.com/dgraph-io/dgraph/dgraph/cmd/live.(*loader).request
/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/batch.go:182
github.com/dgraph-io/dgraph/dgraph/cmd/live.(*loader).makeRequests
/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/batch.go:194
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1333
MichelDiz
(Michel Diz)
April 3, 2019, 3:52pm
2
Are you running this remotely or inside the pods?
If the answer is the second case, I recommend that you use Dgraph Bulk instead. It’s faster and fit perfect with big dataset.
mrjn
(Manish R Jain)
April 4, 2019, 12:44am
3
This happens if either Dgraph crashed, or your load balancer has interrupted the connection. Most likely it is the second case. If you’re running this on AWS with an elastic load balancer, that would most likely be the case.
Valdanito
(Valdanito)
April 4, 2019, 1:59am
4
remotely
and must use live for some reason
Valdanito
(Valdanito)
April 4, 2019, 2:04am
5
it is kubenetes load balancer
apiVersion: v1
kind: Service
metadata:
name: dgraph-zero-public
labels:
app: dgraph-zero
spec:
type: LoadBalancer
ports:
- port: 5080
targetPort: 5080
name: zero-grpc
- port: 6080
targetPort: 6080
name: zero-http
selector:
app: dgraph-zero
---
apiVersion: v1
kind: Service
metadata:
name: dgraph-alpha-public
labels:
app: dgraph-alpha
spec:
type: LoadBalancer
ports:
- port: 8080
targetPort: 8080
name: alpha-http
- port: 9080
targetPort: 9080
name: alpha-grpc
selector:
app: dgraph-alpha
---
apiVersion: v1
kind: Service
metadata:
name: dgraph-ratel-public
labels:
app: dgraph-ratel
spec:
type: LoadBalancer
ports:
- port: 8000
targetPort: 8000
name: ratel-http
selector:
app: dgraph-ratel
---
# This is a headless service which is necessary for discovery for a dgraph-zero StatefulSet.
# https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#creating-a-statefulset
apiVersion: v1
kind: Service
metadata:
name: dgraph-zero
labels:
app: dgraph-zero
spec:
ports:
- port: 5080
targetPort: 5080
name: zero-grpc
clusterIP: None
selector:
app: dgraph-zero
---
# This is a headless service which is necessary for discovery for a dgraph-alpha StatefulSet.
# https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#creating-a-statefulset
apiVersion: v1
kind: Service
metadata:
name: dgraph-alpha
labels:
app: dgraph-alpha
spec:
ports:
- port: 7080
targetPort: 7080
name: alpha-grpc-int
clusterIP: None
selector:
app: dgraph-alpha
---
# This StatefulSet runs 3 Dgraph Zero.
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: dgraph-zero
spec:
serviceName: "dgraph-zero"
replicas: 3
selector:
matchLabels:
app: dgraph-zero
template:
metadata:
labels:
app: dgraph-zero
annotations:
prometheus.io/scrape: 'true'
prometheus.io/path: '/debug/prometheus_metrics'
prometheus.io/port: '6080'
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- dgraph-zero
topologyKey: kubernetes.io/hostname
containers:
- name: zero
image: dgraph/dgraph:v1.0.13
imagePullPolicy: IfNotPresent
ports:
- containerPort: 5080
name: zero-grpc
- containerPort: 6080
name: zero-http
volumeMounts:
- name: datadir
mountPath: /dgraph
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
command:
- bash
- "-c"
- |
set -ex
[[ `hostname` =~ -([0-9]+)$ ]] || exit 1
ordinal=${BASH_REMATCH[1]}
idx=$(($ordinal + 1))
if [[ $ordinal -eq 0 ]]; then
dgraph zero --my=$(hostname -f):5080 --idx $idx --replicas 3
else
dgraph zero --my=$(hostname -f):5080 --peer dgraph-zero-0.dgraph-zero.${POD_NAMESPACE}.svc.cluster.local:5080 --idx $idx --replicas 3
fi
terminationGracePeriodSeconds: 30
volumes:
- name: datadir
persistentVolumeClaim:
claimName: datadir
updateStrategy:
type: RollingUpdate
volumeClaimTemplates:
- metadata:
name: datadir
annotations:
volume.alpha.kubernetes.io/storage-class: anything
spec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: 5Gi
---
# This StatefulSet runs 3 replicas of Dgraph Alpha.
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: dgraph-alpha
spec:
serviceName: "dgraph-alpha"
replicas: 3
selector:
matchLabels:
app: dgraph-alpha
template:
metadata:
labels:
app: dgraph-alpha
annotations:
prometheus.io/scrape: 'true'
prometheus.io/path: '/debug/prometheus_metrics'
prometheus.io/port: '8080'
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- dgraph-alpha
topologyKey: kubernetes.io/hostname
# Initializing the Alphas:
#
# You may want to initialize the Alphas with data before starting, e.g.
# with data from the Dgraph Bulk Loader: https://docs.dgraph.io/deploy/#bulk-loader.
# You can accomplish by uncommenting this initContainers config. This
# starts a container with the same /dgraph volume used by Alpha and runs
# before Alpha starts.
#
# You can copy your local p directory to the pod's /dgraph/p directory
# with this command:
#
# kubectl cp path/to/p dgraph-alpha-0:/dgraph/ -c init-alpha
# (repeat for each alpha pod)
#
# When you're finished initializing each Alpha data directory, you can signal
# it to terminate successfully by creating a /dgraph/doneinit file:
#
# kubectl exec dgraph-alpha-0 -c init-alpha touch /dgraph/doneinit
#
# Note that pod restarts cause re-execution of Init Containers. Since
# /dgraph is persisted across pod restarts, the Init Container will exit
# automatically when /dgraph/doneinit is present and proceed with starting
# the Alpha process.
#
# Tip: StatefulSet pods can start in parallel by configuring
# .spec.podManagementPolicy to Parallel:
#
# https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#deployment-and-scaling-guarantees
#
initContainers:
- name: init-alpha
image: dgraph/dgraph:master
command:
- bash
- "-c"
- |
echo "Write to /dgraph/doneinit when ready."
until [ -f /dgraph/doneinit ]; do sleep 2; done
volumeMounts:
- name: datadir
mountPath: /dgraph
containers:
- name: alpha
image: dgraph/dgraph:v1.0.13
imagePullPolicy: IfNotPresent
resources:
limits:
cpu: "20"
memory: 40Gi
requests:
cpu: "10"
memory: 10Gi
ports:
- containerPort: 7080
name: alpha-grpc-int
- containerPort: 8080
name: alpha-http
- containerPort: 9080
name: alpha-grpc
volumeMounts:
- name: datadir
mountPath: /dgraph
env:
# This should be the same namespace as the dgraph-zero
# StatefulSet to resolve a Dgraph Zero's DNS name for
# Alpha's --zero flag.
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
command:
- bash
- "-c"
- |
set -ex
dgraph alpha --my=$(hostname -f):7080 --lru_mb 40960 --zero ${DGRAPH_ZERO_PUBLIC_PORT_5080_TCP_ADDR}:5080
terminationGracePeriodSeconds: 30
volumes:
- name: datadir
persistentVolumeClaim:
claimName: datadir
updateStrategy:
type: RollingUpdate
volumeClaimTemplates:
- metadata:
name: datadir
annotations:
volume.alpha.kubernetes.io/storage-class: anything
spec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: 5Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: dgraph-ratel
labels:
app: dgraph-ratel
spec:
selector:
matchLabels:
app: dgraph-ratel
template:
metadata:
labels:
app: dgraph-ratel
spec:
containers:
- name: ratel
image: dgraph/dgraph:v1.0.13
ports:
- containerPort: 8000
command:
- dgraph-ratel
dmai
(Daniel Mai)
April 4, 2019, 6:46pm
6
You’d need to check how the underlying load balancer is configured. On AWS a Kubernetes LoadBalancer is backed by an Elastic Load Balancer.
My 2¢: I’m running Dgraph a local cluster, controlled by a Nomad job (1 zero, 4 alpha, 1 ratel). None of the alphas is really big (7 GB, with lru_mb=2389
) I’m trying to load the 21 million movie database:
% docker exec -ti alpha-a6172fdf-d72a-d8c9-d3f0-7b8951442caf dgraph live -r /tmp/21million.rdf.gz --zero 192.168.168.158:28084 -c 1
I0409 09:45:54.970927 299 init.go:88]
Dgraph version : v1.0.13
Commit SHA-1 : 691b3b35
Commit timestamp : 2019-03-09 19:33:59 -0800
Branch : HEAD
Go version : go1.11.5
For Dgraph official documentation, visit https://docs.dgraph.io .
For discussions about Dgraph , visit http://discuss.dgraph.io .
To say hi to the community , visit https://dgraph.slack.com .
Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2018 Dgraph Labs, Inc.
Creating temp client directory at /tmp/x295687544
badger 2019/04/09 09:45:54 INFO: All 0 tables opened in 0s
Processing /tmp/21million.rdf.gz
[ 2s] Txns: 31 RDFs: 31000 RDFs/sec: 15499 Aborts: 0
[ 4s] Txns: 73 RDFs: 73000 RDFs/sec: 18249 Aborts: 0
…
[ 2m58s] Txns: 1381 RDFs: 1381000 RDFs/sec: 7758 Aborts: 0
[ 3m0s] Txns: 1390 RDFs: 1390000 RDFs/sec: 7722 Aborts: 0
[ 3m2s] Txns: 1392 RDFs: 1392000 RDFs/sec: 7648 Aborts: 7
[ 3m4s] Txns: 1392 RDFs: 1392000 RDFs/sec: 7565 Aborts: 18
[ 3m6s] Txns: 1392 RDFs: 1392000 RDFs/sec: 7484 Aborts: 31
[ 3m8s] Txns: 1392 RDFs: 1392000 RDFs/sec: 7404 Aborts: 40
[ 3m10s] Txns: 1392 RDFs: 1392000 RDFs/sec: 7326 Aborts: 50
[ 3m12s] Txns: 1403 RDFs: 1403000 RDFs/sec: 7307 Aborts: 53
[ 3m14s] Txns: 1405 RDFs: 1405000 RDFs/sec: 7242 Aborts: 53
[ 3m16s] Txns: 1414 RDFs: 1414000 RDFs/sec: 7214 Aborts: 53
[ 3m18s] Txns: 1425 RDFs: 1425000 RDFs/sec: 7197 Aborts: 53
[ 3m20s] Txns: 1434 RDFs: 1434000 RDFs/sec: 7170 Aborts: 53
[ 3m22s] Txns: 1444 RDFs: 1444000 RDFs/sec: 7149 Aborts: 53
[ 3m24s] Txns: 1453 RDFs: 1453000 RDFs/sec: 7123 Aborts: 53
[ 3m26s] Txns: 1462 RDFs: 1462000 RDFs/sec: 7097 Aborts: 53
[ 3m28s] Txns: 1472 RDFs: 1472000 RDFs/sec: 7077 Aborts: 53
[ 3m30s] Txns: 1482 RDFs: 1482000 RDFs/sec: 7057 Aborts: 53
[ 3m32s] Txns: 1491 RDFs: 1491000 RDFs/sec: 7033 Aborts: 53
2019/04/09 09:49:27 transport is closing
github.com/dgraph-io/dgraph/x.Fatalf
/ext-go/1/src/github.com/dgraph-io/dgraph/x/error.go:115
github.com/dgraph-io/dgraph/dgraph/cmd/live.handleError
/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/batch.go:132
github.com/dgraph-io/dgraph/dgraph/cmd/live.(*loader).request
/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/batch.go:174
github.com/dgraph-io/dgraph/dgraph/cmd/live.(*loader).makeRequests
/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/batch.go:186
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1333
The end result is the same as @Valdanito 's but I don’t see any sign of the load-balancer having interrupted the connection (local cluster, very responsive, logs say nothing about it).
Any hints into what I should be looking for? Thanks in advance.
Update: I’ve just noticed that no nodes were inserted, and that the schema I provided just before attempting the import (21million.schema) has vanished as well.
Valdanito
(Valdanito)
April 10, 2019, 6:17am
8
Check your memory config, Includes memory limits for the container and Alpha --lru_mb.
In the process of importing data, Dgraph seems to run out of all the memory you allocated (I am not sure). You need to leave some memory for the system to prevent OOM kill.
@ bruno-unna
The above is my solution to this problem.
But I still want to know the official recommended memory settings.
@ Daniel Mai
amaster507
(Anthony Master)
June 26, 2020, 8:56pm
9
I am having this same exact problem!
I am trying to load 6.8 Mil records which is a 391Mb .rdf file.
I do not have a load balancer in the mix, and am now running with 8Gb Ram on an t3.large instance on AWS.
Tried watching the syslog on the server and it does not show any errors on that side.
ibrahim
(Ibrahim Jarif)
June 27, 2020, 7:36am
10
Tried watching the syslog on the server and it does not show any errors on that side.
Are you sure alpha/zero was running? The error shows up when a connection is lost because the server crashed (any kind of error could cause this)
amaster507
(Anthony Master)
June 28, 2020, 7:55pm
11
After further review and watching docker stats. Alpha uses up all available RAM (8Gb available between zero and alpha) and alpha crashes due to OOM. Is there any kind of memory limit that will allow live loader to work with a smaller memory limit and continue to process but possibly take longer to process. I just need to complete the task period and I am limited to 8Gb. If this is a deal breaker, then is there a way to split up live loads using the same zero nodes variables?
2 Likes
MichelDiz
(Michel Diz)
June 28, 2020, 11:24pm
12
amaster507:
I am limited to 8Gb
@ibrahim some time ago, I have thought about giving a “cooldown” (via flag) to liveloader to reduce the stress on the Alphas and Zeros - giving time to the GC do its work. This would be nice for users who have low resources available. Maybe add it to bulkloader too. What do you think?
1 Like
MichelDiz
(Michel Diz)
June 29, 2020, 3:16am
14
So, please, one of you guys (@gumupaier , @amaster507 ) fill up an issue requesting a “cooldown” approach for liveloader. And maybe another ticket for Bulkloader. And please, follow the issue template, give a concise idea of the issue, and how a “cooldown” would help you.
Also, after the issue created share all possible use cases you can think of, in comments. If it gets popular, it can gain priority.
Cheers.
1 Like
amaster507
(Anthony Master)
June 29, 2020, 3:54am
17
@MichelDiz , thank you for the replies and direction. I will fill out an issue for the live loader in the morning (CDT here) as that was more my case. I love the responsiveness and support by the way!
2 Likes
@MichelDiz thank you for the replies.I will try my best to give a supplementary description of the situation. My English is a little poor, which may be a little strange.
1 Like