Hello,
We had an incident today where an alpha pod started throwing a similar error to what was found in this topic: LOG Compact FAILED with error: MANIFEST removes non-existing table 15777621,
After restarted the cluster, most of our replicas began to stabilize, however the alpha pod originally impacted is restarting constantly throwing the following error:
12:27:33.499 2023/01/05 11:27:33 file does not exist for table 17230931
12:27:33.499 Error while creating badger KV posting store
My assumption is that we need to remove this pod, and it’s PVC following these steps: https://dgraph.io/docs/deploy/kubernetes/#removing-a-dgraph-pod
However, after reviewing the Zero endpoints: https://dgraph.io/docs/v21.03/deploy/dgraph-zero/#endpoints
It’s mentioned that you cannot use the same idx
on the restarted alpha pod. Does this mean we simply cannot restart the alpha pod after removing the PVC, in this case dgraph-alpha-1
?
Unfortunately, it will be rather difficult for us to change the idx
value of that pod. We are also unable to delete the PVC, without first removing the pod entirely.
We had the idea to scale down our dgraph cluster to 1 alpha and 1 zero, remove the PVC, and remove the alpha from Zero via the endpoint. Then scale it back up to 3 alpha/zero. Is there any issues with doing it this way?
Currently our leaders are on alpha-2 and zero-2, if we scale down to 1 of each, will the leaders be re-elected accordingly?
Please let me know your thoughts.