Now, restart the cluster by restarting docker with sudo service docker restart
Dgraph fails to start with errors (pods are running though):
Alpha:
E1023 15:47:59.946666 19 groups.go:1137] Error during SubscribeForUpdates for prefix "\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E1023 15:48:00.946904 19 groups.go:1137] Error during SubscribeForUpdates for prefix "\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
I1023 15:48:01.144411 19 admin.go:593] Error reading GraphQL schema: Please retry again, server is not ready to accept requests.
E1023 15:48:01.947163 19 groups.go:1137] Error during SubscribeForUpdates for prefix "\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E1023 15:48:02.948124 19 groups.go:1137] Error during SubscribeForUpdates for prefix "\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E1023 15:48:03.948409 19 groups.go:1137] Error during SubscribeForUpdates for prefix "\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E1023 15:48:04.948747 19 groups.go:1137] Error during SubscribeForUpdates for prefix "\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E1023 15:48:05.948927 19 groups.go:1137] Error during SubscribeForUpdates for prefix "\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
I1023 15:48:06.144927 19 admin.go:593] Error reading GraphQL schema: Please retry again, server is not ready to accept requests.
E1023 15:48:06.949090 19 groups.go:1137] Error during SubscribeForUpdates for prefix "\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E1023 15:48:07.949459 19 groups.go:1137] Error during SubscribeForUpdates for prefix "\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E1023 15:48:08.949704 19 groups.go:1137] Error during SubscribeForUpdates for prefix "\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
E1023 15:48:09.949929 19 groups.go:1137] Error during SubscribeForUpdates for prefix "\x00\x00\x15dgraph.graphql.schema\x00": Unable to find any servers for group: 1. closer err: <nil>
Zero:
I1023 15:35:19.507573 18 run.go:127] Setting up grpc listener at: 0.0.0.0:5080
I1023 15:35:19.508051 18 run.go:127] Setting up http listener at: 0.0.0.0:6080
I1023 15:35:19.508353 18 run.go:298] Opening zero BadgerDB with options: {Dir:zw ValueDir:zw SyncWrites:false TableLoadingMode:2 ValueLogLoadingMode:2 NumVersionsToKeep:1 ReadOnly:false Truncate:true Logger:0xc00007bfa0 Compression:2 InMemory:false MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:1048576 NumMemtables:5 BlockSize:4096 BloomFalsePositive:0.01 KeepL0InMemory:false BlockCacheSize:0 IndexCacheSize:0 LoadBloomsOnOpen:false NumLevelZeroTables:5 NumLevelZeroTablesStall:15 LevelOneSize:268435456 ValueLogFileSize:67108864 ValueLogMaxEntries:1000000 NumCompactors:2 CompactL0OnClose:true LogRotatesToFlush:2 ZSTDCompressionLevel:3 VerifyValueChecksum:false EncryptionKey:[] EncryptionKeyRotationDuration:240h0m0s BypassLockGuard:false ChecksumVerificationMode:0 DetectConflicts:true managedTxns:false maxBatchCount:0 maxBatchSize:0}
badger 2020/10/23 15:35:19 INFO: All 1 tables opened in 74ms
badger 2020/10/23 15:35:20 INFO: Replaying file id: 0 at offset: 21571
badger 2020/10/23 15:35:20 INFO: Replay took: 12.285µs
I1023 15:35:20.190879 18 node.go:149] Setting raft.Config to: &{ID:1 peers:[] learners:[] ElectionTick:20 HeartbeatTick:1 Storage:0xc0003c4440 Applied:0 MaxSizePerMsg:262144 MaxCommittedSizePerReady:67108864 MaxUncommittedEntriesSize:0 MaxInflightMsgs:256 CheckQuorum:false PreVote:true ReadOnlyOption:0 Logger:0x2be83e8 DisableProposalForwarding:false}
I1023 15:35:20.191281 18 node.go:318] Found hardstate: {Term:2 Vote:1 Commit:132 XXX_unrecognized:[]}
I1023 15:35:20.191555 18 node.go:327] Group 0 found 132 entries
I1023 15:35:20.191575 18 raft.go:448] Restarting node for dgraphzero
I1023 15:35:20.191835 18 log.go:34] 1 became follower at term 2
I1023 15:35:20.192309 18 log.go:34] newRaft 1 [peers: [], term: 2, commit: 132, applied: 0, lastindex: 132, lastterm: 2]
I1023 15:35:20.192522 18 run.go:374] Running Dgraph Zero...
I1023 15:35:20.505187 18 node.go:186] Setting conf state to nodes:1
I1023 15:35:20.505312 18 raft.go:708] Done applying conf change at 0x1
I1023 15:35:20.505627 18 pool.go:160] CONNECTING to dgraph-dgraph-alpha-0.dgraph-dgraph-alpha-headless.db.svc.cluster.local:7080
W1023 15:35:20.505659 18 pool.go:191] Shutting down extra connection to dgraph-dgraph-alpha-0.dgraph-dgraph-alpha-headless.db.svc.cluster.local:7080
W1023 15:35:20.509770 18 raft.go:739] Raft.Ready took too long to process: Timer Total: 316ms. Breakdown: [{sync 225ms} {disk 86ms} {proposals 5ms} {advance 0s}]. Num entries: 0. MustSync: true
I1023 15:35:21.193543 18 log.go:34] 1 no leader at term 2; dropping index reading msg
W1023 15:35:23.193596 18 node.go:675] [0x1] Read index context timed out
I1023 15:35:23.193654 18 log.go:34] 1 no leader at term 2; dropping index reading msg
I1023 15:35:23.893708 18 log.go:34] 1 is starting a new election at term 2
I1023 15:35:23.893752 18 log.go:34] 1 became pre-candidate at term 2
I1023 15:35:23.893761 18 log.go:34] 1 received MsgPreVoteResp from 1 at term 2
I1023 15:35:23.893978 18 log.go:34] 1 became candidate at term 3
I1023 15:35:23.893997 18 log.go:34] 1 received MsgVoteResp from 1 at term 3
I1023 15:35:23.894327 18 log.go:34] 1 became leader at term 3
I1023 15:35:23.894358 18 log.go:34] raft.node: 1 elected leader 1 at term 3
I1023 15:35:23.894424 18 raft.go:673] I've become the leader, updating leases.
I1023 15:35:23.894444 18 assign.go:42] Updated Lease id: 10001. Txn Ts: 10001
W1023 15:35:25.193693 18 node.go:675] [0x1] Read index context timed out
W1023 15:35:25.864210 18 pool.go:254] Connection lost with dgraph-dgraph-alpha-0.dgraph-dgraph-alpha-headless.db.svc.cluster.local:7080. Error: rpc error: code = Canceled desc = context canceled
W1023 15:35:25.865755 18 pool.go:254] Connection lost with dgraph-dgraph-alpha-0.dgraph-dgraph-alpha-headless.db.svc.cluster.local:7080. Error: rpc error: code = Unknown desc = No node has been set up yet
More info:
Name: dgraph-dgraph-alpha
Namespace: db
CreationTimestamp: Fri, 23 Oct 2020 16:07:34 +0000
Selector: app=dgraph,chart=dgraph-0.0.11,component=alpha,release=dgraph
Labels: app=dgraph
app.kubernetes.io/managed-by=Helm
chart=dgraph-0.0.11
component=alpha
heritage=Helm
release=dgraph
Annotations: meta.helm.sh/release-name: dgraph
meta.helm.sh/release-namespace: db
Replicas: 1 desired | 0 total
Update Strategy: RollingUpdate
Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: app=dgraph
chart=dgraph-0.0.11
component=alpha
release=dgraph
Annotations: prometheus.io/path: /debug/prometheus_metrics
prometheus.io/port: 8080
prometheus.io/scrape: true
Containers:
dgraph-dgraph-alpha:
Image: docker.io/dgraph/dgraph:v20.07.1
Ports: 7080/TCP, 8080/TCP, 9080/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Command:
bash
-c
set -ex
dgraph alpha --my=$(hostname -f):7080 --zero dgraph-dgraph-zero-0.dgraph-dgraph-zero-headless.${POD_NAMESPACE}.svc.cluster.local:5080
Requests:
memory: 100Mi
Environment:
POD_NAMESPACE: (v1:metadata.namespace)
DGRAPH_ALPHA_CONFIG: /dgraph/config/config.toml
Mounts:
/dgraph from datadir (rw)
/dgraph/config from config-volume (rw)
Volumes:
datadir:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: datadir
ReadOnly: false
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: dgraph-dgraph-alpha-config
Optional: false
Volume Claims:
Name: datadir
StorageClass: standard
Labels: <none>
Annotations: volume.alpha.kubernetes.io/storage-class=anything
Capacity: 200Mi
Access Modes: [ReadWriteOnce]
Events: <none>
It works again only after I destroy the entire cluster and create it again.
Just to validate if normal pod restart works, I tried running kubectl -n db rollout restart statefulset dgraph-dgraph-alpha and everything was great.
Expected behaviour and actual result.
Dgraph should work even after restarts - important cause I am running kind within a pre-emptible VM in GCP which resets every 24 hours. Since I am not sure if this is an issue with Dgraph or Kind, I have filed an issue in Kind repo too here: dgraph errors on restart · Issue #1909 · kubernetes-sigs/kind · GitHub
Looking at your situation, preemptible GCE running Dgraph cluster in KinD, here are some thoughts:
restarting docker would seem to simulate a destructive event equivalent of restarting GKE vm instances (GCE) all at once. I haven’t tested this scenario yet.
the big question is if the data saved into a volume so that it persists across destructive events?
In the preemptible scenario, where google claims the GCE resource, this would happen, unless both a docker volume resource is allocated and that resides on an external disk, mounted by GCE at started, docker uses the same volume when starting up KinD, and the dgraph pods re-attach to the volume (3 layers - pvc/pv to volume → docker volume → GCE external volume).
Why persistent volume so important? Dgraph for both zero and alphas maintain the state of cluster and membership with data saved on disk, so if data is lost that state cannot be discovered, and the cluster is in a funk. Also, of course, graph data is stored on the disk for alphas.
Recently, in my testing, I came across an issue with dgraph v20.07.1 used by the helm chart, where dgraph alpha may not terminate, and so it gets stuck. When this happens, I had to use kubectl delete --force to forcibly remove these. I don’t think you are running into this as restart seemed to work fine.
I haven’t tested KinD yet or the rancher local-path-provisioner, so this is something I can try out, see what I can discover.
I have my storage SSDs set to persist (both Boot disk and attached disk) and so, it would stay across destructive events as you rightly said.
I came across something similar where I was not able to delete the PVC provisioned by Dgraph and it got stuck. I tried force deleting and it didn’t work. Then I removed the finalizers from the PVCs and then I was able to delete the PVCs. But anyways, I don’t think that is related to this.
If there is anything you would like me to try, do let me know. Thanks.
@joaquin Just got a reply from Ben from Kind here: dgraph errors on restart · Issue #1909 · kubernetes-sigs/kind · GitHub and after seeing that, I also did a test on my end to dig into this further and from the looks of it, I guess the data is still available in the PVC even after Docker/VM restarts but still there is something else which is causing the issue.
If you notice the date, those are the folders from the PVC mounted to Dgraph pod which were created 1 day ago (before the pre-emptible VM restart). So, the data does exist in the PVC but Dgraph fails.
My guess is that Dgraph might be maintaining some kind of state or info in memory which can get lost across restarts (but strange to see this not happening when doing a rollout restart) maybe because all the nodes are not down all at once as you said.
The disk storage is necessary to maintain state (under the hood, raft protocol), which is similar to what clusters use to maintain their state and elect leaders, such as zookeeper, etcd, and consul. Thus if the data is lost, which means the state of the cluster is lost, and you in a way start from scratch.
One thing to look at is if the same claim (PVC) is still used to the correct volume (PV), or are new claims made. The reclaim policy for the storage should not be set to delete, but rather retain, so that it can scope up existing storage after an outage event. Otherwise, if the pods are cycled out, they may not be able to grab onto the same volume.
@joaquin Hi. While the reclaim policy of my the PVs are set to Delete in my case, I am pretty sure that it may not be the problem cause even after restarts, the same PV is used by the PVC and the PVC is also bound to the POD properly but still to validate your assumption, I did give it a test.
The problem here is that its not just about the pre-emptible VM. Even if I provision a normal VM with Kind cluster within it and if I shut down the VM sometime to save on cost and start it again, kind is anyways going to restart which will lead to the same state since this happens when restarting docker/cluster.
Actually I am fine with destroying the cluster and creating afresh again everytime since it doesn’t take much time but I am not sure how I can automatically restore the data since the PVCs will get destroyed as well and Dgraph export also sends the data to PVC.
Or if there is a way to gracefully run some sort of shutdown script for Dgraph, I can do that as well which can get triggered before the pre-emptible VM shuts down (won’t work on docker restart though).
For this could you try adding extra mounts from the host onto the Kind worker container? This way you could use a hostPath of that path onto your Dgraph cluster? The config for which would look like this - config.yaml · GitHub ( you’d need to create the directories locally, say, mkdir -p ~/dgraph-host-mount/alpha ~/dgraph-host-mount/zero and reflect that path onto config.yaml). The data survives a docker restart.
Ensure Dgraph shuts down cleanly when the GKE preemption happens
@slotlocker2 Thanks a lot for the tips. Will try out everything you mentioned and get back to you tomorrow with the results. I would have to modify the existing Dgraph Helm chart to make the relevant changes as given in your yaml as well - Will try hostpath-provisioner (last commit was 1 year ago though - so, not sure if it will still work).
Regarding the preemptible killer, is it really the only way to go about safe termination?
(just realized the Kind config is named kind-extra-config as opposed to say a config.yaml in the gist above)
Regarding the host path provisioner you could use the one from Rancher as long as you are pointing it to the path mounted onto the kind worker container from the underlying GKE persistent disk. Maybe you wouldn’t need to make the hostPath changes in Helm if you can get the local provisioner to work in such a configuration.
Dgraph should safely terminate soon enough unless the underlying node shuts down rather abruptly SIGKILLing the process and not allowing it to do the cleanup. A similar project for AWS does just that - sends SIGTERM to the pods so they can exit cleanly as and when it gets the 2 minute spot shutdown notification(grace period would ideally be under 120 seconds to work on spot in which case).
Using hostPath without any provisioner (same gist as shared) and restarting - Worked well
Mounting /var/local-path-provisioner from the Kind node onto the host using extraMounts and restarting the cluster - Did not work
Replacing the same gist shared by @slotlocker2 to use the standard storage class instead of using hostpath and restarting - Worked well
Using the Dgraph helm chart with standard storage class and restarting - Did not work
I will go ahead with what is working now (will use the helm chart for prod and yaml for dev for now) but to actually help you reproduce the problem I am facing, I have added a gist here which shares the config:
So, looks like its not an issue with Kind or the Local path provisioner but has something to do with the helm chart.
The significant difference I see is the command being used to start zero
Helm chart has:
- bash
- "-c"
- |
set -ex
[[ `hostname` =~ -([0-9]+)$ ]] || exit 1
ordinal=${BASH_REMATCH[1]}
idx=$(($ordinal + 1))
if [[ $ordinal -eq 0 ]]; then
exec dgraph zero --my=$(hostname -f):5080 --idx $idx --replicas 5
else
exec dgraph zero --my=$(hostname -f):5080 --peer dgraph-dgraph-zero-0.dgraph-dgraph-zero-headless.${POD_NAMESPACE}.svc.cluster.local:5080 --idx $idx --replicas 5
fi
and the yaml has:
- bash
- "-c"
- |
set -ex
dgraph zero --my=$(hostname -f):5080
and also the fact that the helm chart splits the deployment to multiple pods while this uses multiple containers within 1 pod.
Just to see the difference before and after restart in the pods with helm chart, I did a diff in the env and looks like the only thing which changes is the PPID across restarts
UPDATE: GCE just killed my VM and the dgraph cluster restarted without any issues now. So, we might have to see how to fix the helm chart to match this: Dgraph-Kind restart repro · GitHub