Motivation
Dgraph’s current backup/export API makes the user wait until the whole operation is complete. While this synchronous behaviour is great for smaller databases, it times out if it takes longer than 10 minutes; after this, the only way of knowing its status is checking logs. As @dmai pointed out, this starts occurring when the p
directory is ~50 GB (see here).
It’s generally not considered good API design to make the user wait for long-running operations. Not only is it an inconvenience for a user who doesn’t want to wait around for the operation to complete, it’s also using up an HTTP connection for no good reason. In fact, the timeout that occurs is happening server-side, to prevent slow clients from leaking HTTP connections.
Architecture
Long running tasks such as backups and exports will be added to a task queue when created. Upon creation, they immediately return an ID to the user, who can then use this ID to get the current status of the task. Since none of these are blocking operations, they will not experience the problems mentioned above.
When querying a Task
from the queue, the schema would be as follows:
interface Task @withSubscription {
id: ID!
status: Status!
}
enum Status {
Queued
Running
Error
Success
}
type BackupTask implements Task {
# etc.
}
type ExportTask implements Task {
# etc.
}
Upon completion, tasks will be automatically deleted in two weeks. If a task has failed, it will be not be deleted unless the user manually deletes it.
User Impact
In order to keep it user-friendly:
- The old synchronous API will remain as is - it works very well for smaller databases.
- A window can be added to Ratel to monitor the progress of currently running tasks, and inspect failed tasks.