Empty reply from server for exports and backups


What version of Dgraph are you using?

v20.03.4

Have you tried reproducing the issue with the latest release?

Yes

What is the hardware spec (RAM, OS)?

40 GB Ubuntu

Steps to reproduce the issue (command/config used to run Dgraph).

Have a lot of data in Dgraph and then request a export or a backup.

When running via curl, the request exits with the message Empty reply from server. According to the Alpha logs, the export/backup is still being created.

Expected behaviour and actual result.

I expect the request stay alive while the export/backup is being taken. It shouldn’t exit while the export/backup is being created. I’d expect to see this response when, say, an export completes:

{"code": "Success", "message": "Export completed."}

When the request connection exits early, there’s no way for the user to know when the request has been completed other than monitoring the Alpha logs.

1 Like

I am assuming it is a regression from 20.03.3? Did we miss it during our release testing or is it only happening for instances with

cc @Neeraj @Rahul
why did you guys not run into this in your testing?

Hey @dmai, how big is the data? We tested with 1million dataset and it worked fine.

1 Like

Yes, the export is created before we get the “Success” response, when we tested for the 1 million data set.

A 50 GB p directory should be sufficient enough to show this behavior.

It’s not a regression in this particular patch release. I suspect this is an issue for Alphas with large data sizes.

As an example, other systems set up keep-alive messages for requests. e.g., Twitter’s streaming API (see “Keep-alive signals” section) has a keep-alive heartbeat every 10 seconds so that the connection doesn’t terminate.

2 Likes

After some experiments, I found that the timeout is actually occurring on the server side - in fact, most operations in curl do not have a timeout.

The culprit is actually in Dgraph’s serveHTTP function, which times out response writes at 10 minutes. Of course, we can’t remove the timeout without risking leaking connections, so the only course of action here is to increase the timeout if desired.

This article outlines an approach for executing long-running tasks in a REST API. The idea is to add the task to a queue and immediately return its ID. The client can then query using this ID to find the status of the task.

We’re also experiencing this issue with exports described by the OP.

Does the enterprise version have this same issue? I would assume yes because a similar mutation to /admin is needed, but could you confirm?

Any idea one timeline of this RFC? [RFC] GraphQL API for long-running tasks - #6 by anand

Thank you

Does the enterprise version have this same issue?

Yes, the same issue exists on both. It doesn’t affect functionality, though - the timeout you see is just the HTTP request timing out. The backup/export is still running in Dgraph.

Any idea one timeline of this RFC?

Work on making the API asynchronous has started, and should be available in Dgraph 21.07.