Have you tried reproducing the issue with the latest release?
Yes
What is the hardware spec (RAM, OS)?
40 GB Ubuntu
Steps to reproduce the issue (command/config used to run Dgraph).
Have a lot of data in Dgraph and then request a export or a backup.
When running via curl, the request exits with the message Empty reply from server. According to the Alpha logs, the export/backup is still being created.
Expected behaviour and actual result.
I expect the request stay alive while the export/backup is being taken. It shouldn’t exit while the export/backup is being created. I’d expect to see this response when, say, an export completes:
When the request connection exits early, there’s no way for the user to know when the request has been completed other than monitoring the Alpha logs.
A 50 GB p directory should be sufficient enough to show this behavior.
It’s not a regression in this particular patch release. I suspect this is an issue for Alphas with large data sizes.
As an example, other systems set up keep-alive messages for requests. e.g., Twitter’s streaming API (see “Keep-alive signals” section) has a keep-alive heartbeat every 10 seconds so that the connection doesn’t terminate.
The culprit is actually in Dgraph’s serveHTTP function, which times out response writes at 10 minutes. Of course, we can’t remove the timeout without risking leaking connections, so the only course of action here is to increase the timeout if desired.
This article outlines an approach for executing long-running tasks in a REST API. The idea is to add the task to a queue and immediately return its ID. The client can then query using this ID to find the status of the task.
Yes, the same issue exists on both. It doesn’t affect functionality, though - the timeout you see is just the HTTP request timing out. The backup/export is still running in Dgraph.
Any idea one timeline of this RFC?
Work on making the API asynchronous has started, and should be available in Dgraph 21.07.