Experience Report for Feature Request
The current design around removing a zero node from a group (docs) puts a significant burden on the users because the operator or automation would have to manage state outside of dgraph, such as which system (container or pod) is paired with which index.
What you wanted to do
Dgraph zero would manage the idx state so that users are external automation does not have to be burden by operator manual remediation or complex external automation.
What you actually did
Manual Remedication (actually happened)
In a scenario where we have
zero1 had to be removed and replaced, where the user would have to do the following to remediate:
- In Kubernetes in particular, edit the live deployed statefulSet such that if idx==2, set idx=4. (this actually had to be done for one customer)
- On non-Kubernetes, something similar would have to take place.
Required Automation by Operator for Remedication (not-yet-invented)
For a potential automated solution required by the user, the user would have to build out an external mechanism outside of dgraph and outside of the orchestration platform, to maintain the state (e.g. consul, etcd, etc) such that, given the initial state:
and afterward, upon the removal and replacement of
zero1, the state would update to:
Why that wasn’t great, with examples
Users should not need to build out complex distributed state management solutions to supplement Dgraph, nor should they be required to intervene with custom hacks to their deployed infrastructure on top of their orchestration platform help zero.
Any external references to support your case
The customer was directly affected by this.