Backups: Large Backups Hold Session Open Long Time without Feedback

joaquin · September 10, 2020, 7:09am

Experience Report for Feature Request

This was suggested by a customer:

Is there an option to have the command return a 200 - “Backup has started job-id-####” and then poll for completion of that backup job periodically? I’m not sure keeping an HTTP connection alive for more than a few seconds, w/o any data xfer, makes much sense here.

What you wanted to do

Large backups of 50+ GB can hold a connection for a long time until the backup is finished. During this time there’s no feedback, you don’t know what is happening if something went wrong.

Could there be an interface where the backup process returns 200 with a message that the backup has started with a job-id? Afterward, the customer could then poll for status using a job_id, which can inform the user where it is and if it was completed.

What you actually did

The customer cannot do anything at this point, except wait and pray.

Why that wasn’t great, with examples

As above, the customer has to wait and hope the backup has succeeded and that connection does not get disconnected.

Any external references to support your case

(references upon request)

jokk33 · October 19, 2020, 3:42am

Also found this issue, when try to backup small data, it works fine. But when the database is large, like 50G, backup stuck forever and there is no way to track the problem. I guess there do exists a timeout problem.

For restore, I think it works in a brilliant way. Restore is an asynchronous processing, once call this API, it will return response quickly, offering restore_id. And we can track restore status by this ID. If there is any way to track backup, Dgraph will be perfect!

Topic		Replies	Views
Backup tracing Dgraph kind:feature	2	889	October 21, 2020
Empty reply from server for exports and backups Dgraph dgraph , status:accepted , kind:bug , area:operations , ticket:created	8	1596	April 23, 2021
[RFC] GraphQL API for long-running tasks Dev rfc , dgraph	6	2548	December 30, 2021
Add progress notification for very long operations Dgraph kind:enhancement	5	1011	December 14, 2020
"message": "read tcp server_ip:8080->jump_ip:52182: i/o timeout", "extensions": {"code": "ErrorInvalidRequest" Dgraph kind:question , dgraph	3	1294	November 5, 2020