then removed all the pvcs and pvs related to those Alphas and now when I’m scaling up the Alphas I get
...
I1119 10:26:06.414946 16 draft.go:1505] Calling IsPeer
E1119 10:26:06.415704 16 draft.go:1538] Error while calling hasPeer: error while joining cluster: rpc error: code = Unknown desc = No node has been set up yet. Retrying...
...
And again, using Dgraph v20.07.2 and the same operation, scale down, remove PVC, scale up and then
[pod/proj-graph-engine-0/proj-graph-engine] I1119 11:55:31.892497 16 draft.go:1584] Calling IsPeer
[pod/proj-graph-engine-0/proj-graph-engine] E1119 11:55:31.894925 16 draft.go:1617] Error while calling hasPeer: error while joining cluster: rpc error: code = Unknown desc = No node has been set up yet. Retrying...
[pod/proj-graph-engine-1/proj-graph-engine] I1119 11:55:31.990508 17 draft.go:1584] Calling IsPeer
[pod/proj-graph-engine-1/proj-graph-engine] E1119 11:55:31.996177 17 draft.go:1617] Error while calling hasPeer: error while joining cluster: rpc error: code = Unknown desc = No node has been set up yet. Retrying...
[pod/proj-graph-engine-2/proj-graph-engine] I1119 11:55:32.865837 32 draft.go:1584] Calling IsPeer
[pod/proj-graph-engine-2/proj-graph-engine] E1119 11:55:32.879186 32 draft.go:1617] Error while calling hasPeer: error while joining cluster: rpc error: code = Unknown desc = No node has been set up yet. Retrying...
[pod/proj-graph-engine-0/proj-graph-engine] I1119 11:55:32.896428 16 draft.go:1584] Calling IsPeer
[pod/proj-graph-engine-0/proj-graph-engine] E1119 11:55:32.904441 16 draft.go:1617] Error while calling hasPeer: error while joining cluster: rpc error: code = Unknown desc = No node has been set up yet. Retrying...
[pod/proj-graph-engine-1/proj-graph-engine] I1119 11:55:32.996402 17 draft.go:1584] Calling IsPeer
[pod/proj-graph-engine-1/proj-graph-engine] E1119 11:55:33.000037 17 draft.go:1617] Error while calling hasPeer: error while joining cluster: rpc error: code = Unknown desc = No node has been set up yet. Retrying...
[pod/proj-graph-engine-2/proj-graph-engine] I1119 11:55:33.880791 32 draft.go:1584] Calling IsPeer
[pod/proj-graph-engine-2/proj-graph-engine] E1119 11:55:33.886389 32 draft.go:1617] Error while calling hasPeer: error while joining cluster: rpc error: code = Unknown desc = No node has been set up yet. Retrying...
[pod/proj-graph-engine-0/proj-graph-engine] I1119 11:55:33.904971 16 draft.go:1584] Calling IsPeer
[pod/proj-graph-engine-0/proj-graph-engine] E1119 11:55:33.912895 16 draft.go:1617] Error while calling hasPeer: error while joining cluster: rpc error: code = Unknown desc = No node has been set up yet. Retrying...
[pod/proj-graph-engine-1/proj-graph-engine] I1119 11:55:34.004921 17 draft.go:1584] Calling IsPeer
[pod/proj-graph-engine-1/proj-graph-engine] E1119 11:55:34.010401 17 draft.go:1617] Error while calling hasPeer: error while joining cluster: rpc error: code = Unknown desc = No node has been set up yet. Retrying...
@lukaszlenart Did you remove all the Alphas but keep the Zeros? If you’re looking to restart the cluster from scratch you’ll want to start from a clean slate (i.e., new data directories) for all Zeros and Alphas.
@lukaszlenart For this explicit process, using the dgraph helm chart, you could the following:
helm install pge --set image.tag=v20.03.6 dgraph/dgraph
## Scale Down Cluster and Delete Data + State
kubectl scale statefulset pge-dgraph-alpha --replicas=0
kubectl scale statefulset pge-dgraph-zero --replicas=0
kubectl delete pvc --selector release=pge
## Scale Up Cluster Starting with Zeros
kubectl scale statefulset pge-dgraph-zero --replicas=3
## Wait until 3 x healthy zero nodes
kubectl scale statefulset pge-dgraph-alpha --replicas=3
For the bulk loader, on an empty cluster, you would want to use an init container for bulk loader.
Generally, for immutable infrastructure patterns, It may be easier to just delete the statefulsets and recreate them from scratch again. With a helm chart used above, that process would be:
@dmai yes, I removed just Alphas, but then I removed Alphas and Zeros - problem persisted. The main issue is that, when you copied in all files from bulk loader to all the Alphas and then shut them down, they will start complaining in logs about file already exists and the cluster is broken.
@joaquin what do you mean by initContainers? copy data in?
@lukaszlenart Correct. On each of the Alphas, you’d have an initContainer, then do a spin-loop until you finish the bulk load and move directory created to p.
command:
- bash
- "-c"
- |
trap "exit" SIGINT SIGTERM
echo "Write to /dgraph/doneinit when ready."
until [ -f /dgraph/doneinit ]; do sleep 2; done
Then kubectl cp file(s) into initContainer on alpha-0 pod (or from within the initContainer, curl it down), do the bulk load, touch /dgraph/doneinit. Do this same process for alpha-1, then alpha-2.
@lukaszlenart As an example, I added initContainer automation in the current master of dgraph helm chart. If you wanted to use this, you could do the following.
Thanks a lot @joaquin! Just one question: can I run bulk loader on Alphas? I thought I need to do it on Zeros’ leader and the copy/paste “0” to all the Alphas.
I didn’t announce the initContainer feature yet, as the interface will change from alpha.initContainers.generic.enabled to alpha.initContainers.init.enabled. I will also add further automation for specialized initContainers, such as offline restore and bulkloader, but not sure if these two will make it to the next chart 0.0.13.
On this question, bulk loader can run anywhere, but it does need to connect to one of the Dgraph Zero nodes for the process to get timestamp generation. The zero leader is not needed, as members are equal partners and the leadership is elected (elected leader dependent on availability). This is part of the Raft consensus algorithm: https://raft.github.io/.
The output (./out) that has the p directory (for 1 shard cluster) will need to be copied to each Dgraph Alpha node before it starts. Thus it should be possible to do it on one system, and copy the same p directory on each of the Dgraph Alpha nodes before they start.
I haven’t tried that exact process yet, as was following pattern how it would automated within Kubernetes (ala immutable infra style) for this.
Thanks for the clarification. Does it mean I cannot run bulk loader on each Alpha and I must copy the p folder created during the first import to the rest of the Alphas?