[RFC] GraphQL API for long-running tasks

ajeet · September 28, 2020, 11:27am

Motivation

Dgraph’s current backup/export API makes the user wait until the whole operation is complete. While this synchronous behaviour is great for smaller databases, it times out if it takes longer than 10 minutes; after this, the only way of knowing its status is checking logs. As @dmai pointed out, this starts occurring when the p directory is ~50 GB (see here).

It’s generally not considered good API design to make the user wait for long-running operations. Not only is it an inconvenience for a user who doesn’t want to wait around for the operation to complete, it’s also using up an HTTP connection for no good reason. In fact, the timeout that occurs is happening server-side, to prevent slow clients from leaking HTTP connections.

Architecture

Long running tasks such as backups and exports will be added to a task queue when created. Upon creation, they immediately return an ID to the user, who can then use this ID to get the current status of the task. Since none of these are blocking operations, they will not experience the problems mentioned above.

When querying a Task from the queue, the schema would be as follows:

interface Task @withSubscription {
    id: ID!
    status: Status!
}

enum Status {
    Queued
    Running
    Error
    Success
}

type BackupTask implements Task {
    # etc.
}

type ExportTask implements Task {
    # etc.
}

Upon completion, tasks will be automatically deleted in two weeks. If a task has failed, it will be not be deleted unless the user manually deletes it.

User Impact

In order to keep it user-friendly:

The old synchronous API will remain as is - it works very well for smaller databases.
A window can be added to Ratel to monitor the progress of currently running tasks, and inspect failed tasks.

Topic		Replies	Views
Backup tracing Dgraph kind:feature	2	889	October 21, 2020
Export duration Dgraph kind:question	2	854	October 13, 2020
Backups: Large Backups Hold Session Open Long Time without Feedback Dgraph kind:enhancement , status:accepted , ticket:created , backup-and-restore	1	566	October 19, 2020
Slash graphql data backup Dgraph Cloud	6	516	August 6, 2020
Empty reply from server for exports and backups Dgraph dgraph , status:accepted , kind:bug , area:operations , ticket:created	8	1598	April 23, 2021

[RFC] GraphQL API for long-running tasks

Motivation

Architecture

User Impact

Further Reading

Related topics