It's possibe a Stream backup on demand? kind of an idea for feature

it would be interesting to have a backup on demand. This idea could extend to a cloud service for enterprise. The idea is very similar to “dgraph live”, but kind reverse.

The idea itself is an appeal for automation. The manual backup process that has to stop everything you are doing and then start a moving process is problematic. So such a service coming from the Dgraph would come in handy.


"Local/user" instance:

dgraph zero --port_offset -2000 --set-stream-bk=wolverine:7050 --my=zero:5080

dgraph server --memory_mb 2048 --zero:zero:5080

dgraph-wolverine --my=wolverine:7050 --memory_mb=2048 --zero=zero:5080 
--consolidate=30m||wait-idle --intensity=1 --set-ngrok=optionstunnels.yml

Obs. Wolverine it’s a glutton from the same family of the Ratel - The Mustelidae. Just an idea for the name, in the same family of animals xD

The idea was that the Wolverine instance would be listening for Dgraph Zero connections and there would be mixed options for this action. Up there I imagined a “consolidate” flag, and within it two parameters. In case Wolverine would consolidate the backup every 30 minutes or wait for the server to be idle. --intensity is self explanatory.

Options for monitoring via Prometheus would be interesting as well.

The --set-ngrok flag would be a file with several options or keys to use in the enterprise service API.

Cloud Service instance:

The local Wolverine instance would connect with a Dgraph Cloud server via the API and then generate a tunnel that only the Dgraph Cloud knows (random) and with its certain settings and keys. Could be any tunnel service, I’m taking ngrok into consideration why is the best known. you could use a reverse proxy like Traefik (which uses gRPC) I don’t what would be the best.

The tunnel address could be random to avoid problems (Plus https, API keys). On the other side of the Tunnel, in the case on a Dgraph Cloud service. It would have a zero instance and wolverine waiting for a connection via API. To backup what is incoming on demand.

Basically Wolverine will dig a tunnel xD


I would like to clarify that this is not true. You can take backups while the cluster is up and serving requests.

I think what you are talking about is an automatic incremental backup. We have it in the pipeline.

Well, maybe I did not express myself correctly.

In context with the proposal, the difference from “Manual backup” to a backup in streaming. Is in fact whether you want to move the entire database to a new Dgraph version or another environment. You need to stop the input of new data (mean "put the website under maintenance). So you do not have to deal with it later.

The point is, if you have thousands of writes/input per second. You will have an issue in the “moving/migrating” database. Because this information needs to be in the new instance. As quickly as possible. You need to stop “under maintenance”.

So, although you do not need to stop. You have to stop. I wrote this in context with the idea of “stream”. The idea of the “stream” is that you do not have to stop anything. You can keep both instances migrating (just like the zero does with clustering). “Charging” gradually, without haste and with availability. And in the end, kill the old instance.

PS. The idea is more like a “Live migration”.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.