Kubernetes HA -> Alpha CrashLoopBackOff

Hi!
I build a k8s_HA with dgraph v1.0.11 and it is running normally.
Now I upgraded the image to the dgraph/dgraph:master .
but the k8s Alpha status always in “CrashLoopBackOff”.

Alpha logs :
 ++ hostname -f
+ dgraph alpha --my=dgraph-alpha-1.dgraph-alpha.default.svc.cluster.local.:7080 --lru_mb 40960 --zero dgraph-zero-0.dgraph-zero.default.svc.cluster.local:5080
I0325 09:49:37.368763       1 init.go:88] 
Dgraph version   : master
Commit SHA-1     : b050d173
Commit timestamp : 2019-03-19 16:02:35 -0700
Branch           : master
Go version       : go1.11.5
For Dgraph official documentation, visit https://docs.dgraph.io.
For discussions about Dgraph     , visit https://discuss.dgraph.io.
To say hi to the community       , visit https://dgraph.slack.com.
Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2018 Dgraph Labs, Inc.
I0325 09:49:37.369781       1 server.go:126] Setting Badger table load option: mmap
I0325 09:49:37.369802       1 server.go:138] Setting Badger value log load option: mmap
I0325 09:49:37.369813       1 server.go:166] Opening write-ahead log BadgerDB with options: {Dir:w ValueDir:w SyncWrites:true TableLoadingMode:1 ValueLogLoadingMode:2 NumVersionsToKeep:1 MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:65500 NumMemtables:5 NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741823 ValueLogMaxEntries:10000 NumCompactors:2 CompactL0OnClose:true managedTxns:false maxBatchCount:0 maxBatchSize:0 ReadOnly:false Truncate:true Logger:0x2055b20}
I0325 09:49:37.501517       1 node.go:83] All 1 tables opened in 9ms
I0325 09:49:37.502078       1 node.go:83] Replaying file id: 10 at offset: 334068 
Zero logs:
Copyright 2015-2018 Dgraph Labs, Inc.
I0325 09:47:14.540574      12 run.go:98] Setting up grpc listener at: 0.0.0.0:5080
I0325 09:47:14.540806      12 run.go:98] Setting up http listener at: 0.0.0.0:6080
badger 2019/03/25 09:47:14 INFO: All 0 tables opened in 0s
badger 2019/03/25 09:47:14 INFO: Replaying file id: 0 at offset: 0
badger 2019/03/25 09:47:15 INFO: Replay took: 514.670124ms
I0325 09:47:15.176304      12 node.go:151] Setting raft.Config to: &{ID:1 peers:[] learners:[] ElectionTick:100 HeartbeatTick:1 Storage:0xc00049bda0 Applied:77265 MaxSizePerMsg:1048576 MaxCommittedSizePerReady:0 MaxUncommittedEntriesSize:0 MaxInflightMsgs:256 CheckQuorum:false PreVote:true ReadOnlyOption:0 Logger:0x2055b20 DisableProposalForwarding:false}
I0325 09:47:15.177031      12 node.go:275] Found Snapshot.Metadata: {ConfState:{Nodes:[1 2 3] Learners:[] XXX_unrecognized:[]} Index:77265 Term:10 XXX_unrecognized:[]}
I0325 09:47:15.177087      12 node.go:286] Found hardstate: {Term:30 Vote:1 Commit:77308 XXX_unrecognized:[]}
I0325 09:47:15.232805      12 node.go:295] Group 0 found 44 entries
I0325 09:47:15.232842      12 raft.go:433] Restarting node for dgraphzero
I0325 09:47:15.233053      12 node.go:173] Setting conf state to nodes:1 nodes:2 nodes:3 
I0325 09:47:15.234800      12 pool.go:139] CONNECTED to dgraph-alpha-0.dgraph-alpha.default.svc.cluster.local.:7080
I0325 09:47:15.234835      12 pool.go:139] CONNECTED to dgraph-alpha-1.dgraph-alpha.default.svc.cluster.local.:7080
I0325 09:47:15.234854      12 pool.go:139] CONNECTED to dgraph-alpha-2.dgraph-alpha.default.svc.cluster.local.:7080
I0325 09:47:15.234875      12 pool.go:139] CONNECTED to dgraph-zero-1.dgraph-zero.default.svc.cluster.local.:5080
I0325 09:47:15.234917      12 pool.go:139] CONNECTED to dgraph-zero-2.dgraph-zero.default.svc.cluster.local.:5080
I0325 09:47:15.307815      12 node.go:83] 1 became follower at term 30
I0325 09:47:15.307920      12 node.go:83] newRaft 1 [peers: [1,2,3], term: 30, commit: 77308, applied: 77265, lastindex: 77308, lastterm: 30]
I0325 09:47:15.308060      12 run.go:284] Running Dgraph Zero...
I0325 09:47:15.384039      12 oracle.go:106] Purged below ts:81950, len(o.commits):0, len(o.rowCommit):0 
dgraph-ha.yaml:
apiVersion: v1
kind: Service
metadata:
  name: dgraph-zero-public
  labels:
    app: dgraph-zero
spec:
  type: LoadBalancer
  ports:
  - port: 5080
    targetPort: 5080
    name: zero-grpc
  - port: 6080
    targetPort: 6080
    name: zero-http
  selector:
    app: dgraph-zero
---
apiVersion: v1
kind: Service
metadata:
  name: dgraph-alpha-public
  labels:
    app: dgraph-alpha
spec:
  type: LoadBalancer
  ports:
  - port: 8080
    targetPort: 8080
    name: alpha-http
  - port: 9080
    targetPort: 9080
    name: alpha-grpc
  selector:
    app: dgraph-alpha
---
apiVersion: v1
kind: Service
metadata:
  name: dgraph-ratel-public
  labels:
    app: dgraph-ratel
spec:
  type: LoadBalancer
  ports:
  - port: 8000
    targetPort: 8000
    name: ratel-http
  selector:
    app: dgraph-ratel
---
# This is a headless service which is necessary for discovery for a dgraph-zero StatefulSet.
# https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#creating-a-statefulset
apiVersion: v1
kind: Service
metadata:
  name: dgraph-zero
  labels:
    app: dgraph-zero
spec:
  ports:
  - port: 5080
    targetPort: 5080
    name: zero-grpc
  clusterIP: None
  selector:
    app: dgraph-zero
---
# This is a headless service which is necessary for discovery for a dgraph-alpha StatefulSet.
# https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#creating-a-statefulset
apiVersion: v1
kind: Service
metadata:
  name: dgraph-alpha
  labels:
    app: dgraph-alpha
spec:
  ports:
  - port: 7080
    targetPort: 7080
    name: alpha-grpc-int
  clusterIP: None
  selector:
    app: dgraph-alpha
---
# This StatefulSet runs 3 Dgraph Zero.
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: dgraph-zero
spec:
  serviceName: "dgraph-zero"
  replicas: 3
  selector:
    matchLabels:
      app: dgraph-zero
  template:
    metadata:
      labels:
        app: dgraph-zero
      annotations:
        prometheus.io/scrape: 'true'
        prometheus.io/path: '/debug/prometheus_metrics'
        prometheus.io/port: '6080'
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - dgraph-zero
              topologyKey: kubernetes.io/hostname
      containers:
      - name: zero
        image: dgraph/dgraph:master
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 5080
          name: zero-grpc
        - containerPort: 6080
          name: zero-http
        volumeMounts:
        - name: datadir
          mountPath: /dgraph
        env:
          - name: POD_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
        command:
          - bash
          - "-c"
          - |
            set -ex
            [[ `hostname` =~ -([0-9]+)$ ]] || exit 1
            ordinal=${BASH_REMATCH[1]}
            idx=$(($ordinal + 1))
            if [[ $ordinal -eq 0 ]]; then
              dgraph zero --my=$(hostname -f):5080 --idx $idx --replicas 3
            else
              dgraph zero --my=$(hostname -f):5080 --peer dgraph-zero-0.dgraph-zero.${POD_NAMESPACE}.svc.cluster.local:5080 --idx $idx --replicas 3
            fi
      terminationGracePeriodSeconds: 60
      volumes:
      - name: datadir
        persistentVolumeClaim:
          claimName: datadir
  updateStrategy:
    type: RollingUpdate
  volumeClaimTemplates:
  - metadata:
      name: datadir
      annotations:
        volume.alpha.kubernetes.io/storage-class: anything
    spec:
      accessModes:
        - "ReadWriteOnce"
      resources:
        requests:
          storage: 5Gi
---
# This StatefulSet runs 3 replicas of Dgraph Alpha.
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: dgraph-alpha
spec:
  serviceName: "dgraph-alpha"
  replicas: 3
  selector:
    matchLabels:
      app: dgraph-alpha
  template:
    metadata:
      labels:
        app: dgraph-alpha
      annotations:
        prometheus.io/scrape: 'true'
        prometheus.io/path: '/debug/prometheus_metrics'
        prometheus.io/port: '8080'
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - dgraph-alpha
              topologyKey: kubernetes.io/hostname
      # Initializing the Alphas:
      #
      # You may want to initialize the Alphas with data before starting, e.g.
      # with data from the Dgraph Bulk Loader: https://docs.dgraph.io/deploy/#bulk-loader.
      # You can accomplish by uncommenting this initContainers config. This
      # starts a container with the same /dgraph volume used by Alpha and runs
      # before Alpha starts.
      #
      # You can copy your local p directory to the pod's /dgraph/p directory
      # with this command:
      #
      #    kubectl cp path/to/p dgraph-alpha-0:/dgraph/ -c init-alpha
      #    (repeat for each alpha pod)
      #
      # When you're finished initializing each Alpha data directory, you can signal
      # it to terminate successfully by creating a /dgraph/doneinit file:
      #
      #    kubectl exec dgraph-alpha-0 -c init-alpha touch /dgraph/doneinit
      #
      # Note that pod restarts cause re-execution of Init Containers. Since
      # /dgraph is persisted across pod restarts, the Init Container will exit
      # automatically when /dgraph/doneinit is present and proceed with starting
      # the Alpha process.
      #
      # Tip: StatefulSet pods can start in parallel by configuring
      # .spec.podManagementPolicy to Parallel:
      #
      #     https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#deployment-and-scaling-guarantees
      #
      # initContainers:
      #   - name: init-alpha
      #     image: dgraph/dgraph:master
      #     command:
      #       - bash
      #       - "-c"
      #       - |
      #         echo "Write to /dgraph/doneinit when ready."
      #         until [ -f /dgraph/doneinit ]; do sleep 2; done
      #     volumeMounts:
      #       - name: datadir
      #         mountPath: /dgraph
      containers:
      - name: alpha
        image: dgraph/dgraph:master
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 7080
          name: alpha-grpc-int
        - containerPort: 8080
          name: alpha-http
        - containerPort: 9080
          name: alpha-grpc
        volumeMounts:
        - name: datadir
          mountPath: /dgraph
        env:
          # This should be the same namespace as the dgraph-zero
          # StatefulSet to resolve a Dgraph Zero's DNS name for
          # Alpha's --zero flag.
          - name: POD_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
        command:
          - bash
          - "-c"
          - |
            set -ex
            dgraph alpha --my=$(hostname -f):7080 --lru_mb 40960 --zero dgraph-zero-0.dgraph-zero.${POD_NAMESPACE}.svc.cluster.local:5080
      terminationGracePeriodSeconds: 600
      volumes:
      - name: datadir
        persistentVolumeClaim:
          claimName: datadir
  updateStrategy:
    type: RollingUpdate
  volumeClaimTemplates:
  - metadata:
      name: datadir
      annotations:
        volume.alpha.kubernetes.io/storage-class: anything
    spec:
      accessModes:
        - "ReadWriteOnce"
      resources:
        requests:
          storage: 5Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: dgraph-ratel
  labels:
    app: dgraph-ratel
spec:
  selector:
    matchLabels:
      app: dgraph-ratel
  template:
    metadata:
      labels:
        app: dgraph-ratel
    spec:
      containers:
      - name: ratel
        image: dgraph/dgraph:master
        ports:
        - containerPort: 8000
        command:
          - dgraph-ratel

You upgraded from v1.0.11 to Master. That’s the issue. What’s in Master is not compatible with anything before v1.1, that will still be released.

You could even use Master, but the correct way would be to upgrade by export https://docs.dgraph.io/deploy/#upgrade-database

Cheers.

sorry,
I can’t run the kubenates cluster with dgraph/dgraph:master.
because all the Alpha prompt error message :
dgraph-alpha-1.dgraph-alpha.default.svc.cluster.local.:7080 not a valid address

Did you start from a clean state? The current master branch will officially be released as v1.1 next month. As @MichelDiz wrote above, the proper way to upgrade is to export the data from v1.0.11 and then import to a newer version.

In fact, I use Helm to manage kubernetes.
I also tried to use kubernetes directly.
My steps:

  1. kubectl delete pods,statefulsets,services -l app=dgraph-zero
    kubectl delete pods,statefulsets,services -l app=dgraph-alpha
    kubectl delete pods,replicasets,services -l app=dgraph-ratel

  2. wget https://raw.githubusercontent.com/dgraph-io/dgraph/master/contrib/config/kubernetes/dgraph-ha.yaml
    and modify some config.

  3. kubectl create -f dgraph-ha.yaml

so, maybe I didn’t delete the persistentvolumeclaims and persistentvolumes.
I will try again. Thank you for your answer

@ MichelDiz
@dmai
hi,
I tried it again,

kubectl delete pods,statefulsets,services,persistentvolumeclaims,persistentvolumes -l app=dgraph-zero
kubectl delete pods,statefulsets,services,persistentvolumeclaims,persistentvolumes -l app=dgraph-alpha
kubectl delete pods,replicasets,services,persistentvolumeclaims,persistentvolumes -l app=dgraph-ratel

copy from https://raw.githubusercontent.com/dgraph-io/dgraph/master/contrib/config/kubernetes/dgraph-ha.yaml, modify image to dgraph/dgraph:master

kubectl create dgraph_ha.ymal

and the Alpha prompt the same error message :

 ++ hostname -f
+ dgraph alpha --my=dgraph-alpha-2.dgraph-alpha.default.svc.cluster.local.:7080 --lru_mb 40960 --zero 10.68.72.53:5080
I0327 07:59:32.173539       1 init.go:88] 
Dgraph version   : master
Commit SHA-1     : b050d173
Commit timestamp : 2019-03-19 16:02:35 -0700
Branch           : master
Go version       : go1.11.5
For Dgraph official documentation, visit https://docs.dgraph.io.
For discussions about Dgraph     , visit https://discuss.dgraph.io.
To say hi to the community       , visit https://dgraph.slack.com.
Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2018 Dgraph Labs, Inc.
I0327 07:59:32.174597       1 server.go:126] Setting Badger table load option: mmap
I0327 07:59:32.174614       1 server.go:138] Setting Badger value log load option: mmap
I0327 07:59:32.174622       1 server.go:166] Opening write-ahead log BadgerDB with options: {Dir:w ValueDir:w SyncWrites:true TableLoadingMode:1 ValueLogLoadingMode:2 NumVersionsToKeep:1 MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:65500 NumMemtables:5 NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741823 ValueLogMaxEntries:10000 NumCompactors:2 CompactL0OnClose:true managedTxns:false maxBatchCount:0 maxBatchSize:0 ReadOnly:false Truncate:true Logger:0x2055b20}
I0327 07:59:32.289587       1 node.go:83] All 1 tables opened in 1ms
I0327 07:59:32.290564       1 node.go:83] Replaying file id: 0 at offset: 3584
I0327 07:59:32.291365       1 node.go:83] Replay took: 779.49µs
I0327 07:59:32.292420       1 server.go:126] Setting Badger table load option: mmap 
I0327 07:59:32.292440       1 server.go:138] Setting Badger value log load option: mmap
I0327 07:59:32.292449       1 server.go:180] Opening postings BadgerDB with options: {Dir:p ValueDir:p SyncWrites:true TableLoadingMode:2 ValueLogLoadingMode:2 NumVersionsToKeep:2147483647 MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:1024 NumMemtables:5 NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741823 ValueLogMaxEntries:1000000 NumCompactors:2 CompactL0OnClose:true managedTxns:false maxBatchCount:0 maxBatchSize:0 ReadOnly:false Truncate:true Logger:0x2055b20}
I0327 07:59:32.361725       1 node.go:83] All 1 tables opened in 1ms
I0327 07:59:32.362366       1 node.go:83] Replaying file id: 0 at offset: 1092
I0327 07:59:32.363187       1 node.go:83] Replay took: 798.304µs
2019/03/27 07:59:32 dgraph-alpha-2.dgraph-alpha.default.svc.cluster.local.:7080 is not valid address
github.com/dgraph-io/dgraph/x.AssertTruef
	/ext-go/1/src/github.com/dgraph-io/dgraph/x/error.go:88
github.com/dgraph-io/dgraph/worker.StartRaftNodes
	/ext-go/1/src/github.com/dgraph-io/dgraph/worker/groups.go:80
github.com/dgraph-io/dgraph/dgraph/cmd/alpha.run.func5
	/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/alpha/run.go:570
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1333 

I followed your same steps and the cluster is running fine on my end.

Is there a specific environment you’re deploying Dgraph in? The error you shared shows that the Alpha address being set from $(hostname) is the FQDN ending in a period. The address part shouldn’t include the “.” in the end.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.