How can I bypass the mutation stop while re-indexing?

Though Dgraph allows you to change the index type of a predicate, do it only if it’s necessary. When the indices are changed, the data needs to be re-indexed, and this takes some computing, so it could take a bit of time. While the re-indexing operation is running, all mutations will be put on hold.

Imagine you have a running application and you need to change an index. Re-indexing can take manyyyyy minutes. And if it is a critical application like a warehouse system or a hospital or an online shop, it is impossible to allow maintenances like stopping mutations. That’s really a No-Go. The system has to be up 24/7.

So, is it possible to bypass that? Else I can’t imagine dgraph anymore in a production app because it would be a massive problem updating something afterwards??

IMO all systems should be allowed maintenance windows. Even in the hospital environment where I work we have weekly maintenance windows where some or even ALL systems may be offline for a set amount of time. Communication and preparation is the key.

During this week’s maintenance window we will be doing <chore>. Which will effect <users> by <use cases>. During this time please <alternative solutions>. Thank you for your understanding and cooperation in helping make <organization> even better.

Some alternative solutions are just wait until maintenance is complete, while other solutions require temporary adjustments in workflows and scheduling. An Emergency Department cannot close down just for IT maintenance and so we work around this. Example: when VPN is going down, we ask the Radiologist on call to be on site during this maintenance window in case he needs to read images immediately instead of being able to be woken up from home to login remotely.

If you truly need a no-stop production then the only solution would be to roll a hot-swap and even then you bear the risk of very short delays (sometimes just ms). An alternative would be to implement hot-swap with a que handler. So there would be a layer that ques transaction during the time when a system may be unreachable.

There is another topic regarding rolling upgrades across a HA instance which would be a similar topic of conversation.

1 Like