Importing larger .rdf file with curl

I have generated a import.rdf file based on our data. This file is 416Mb. It is formatted similarly to:

{
  set {
    _:user_1 <dgraph.type> "User" .
    [6,821,357 more lines]
}

I tried to process the file with

curl -H "Content-Type: application/rdf" "localhost:8080/mutate?commitNow=true" -XPOST --data-binary '@path/to/directory/import.rdf'`

I then received the error message:

{"errors":[{"message":"read tcp 172.17.0.2:8080-\u003e172.17.0.1:34272: i/o timeout","extensions":{"code":"ErrorInvalidRequest"}}]}

I am not sure where 172.17.0.1:34272 is coming from as that is not my private ip address. I am guessing that it may be a network setup by Docker on my PC?

I am running dgraph/standalone:v20.03.1 through Docker on Windows 10 and using PowerShell 5.1.19041.1

Yes, 172.17.0.1 is from Docker Network.

Are you using whitelisting?

Also, “ErrorInvalidRequest” is a CURL error. Might mean that the params are wrong or the file doesn’t exist.

let me simplify it to a smaller mutation in the same directory and then try to get that working.

Also, Windows uses Backslashes.

I created a simple 6 line test.rdf file:

{
  set {
    _:test_1 <dgraph.type> "Contact" .
    _:test_1 <Contact.firstName> "Test" .
  }
}
curl -H "Content-Type: application/rdf" "localhost:8080/mutate?commitNow=true" -XPOST --data-binary '@.notes/test.rdf'

and it worked successfully:

{"data":{"code":"Success","message":"Done","queries":null,"uids":{"test_1":"0x2713"}},"extensions":{"server_latency":{"parsing_ns":18100,"processing_ns":223770300,"assign_timestamp_ns":927500,"total_ns":224831800},"txn":{"start_ts":10163,"commit_ts":10164,"preds":["1-Contact.firstName","1-dgraph.type"]}}}

I am still getting the same error as before though when trying to process import.rdf in the same file.

Hum, okay. So maybe there is a parser error. But it isn’t throwing correctly to the user.

When you send the request, your cluster reports bad logs? like crash or “closing connection” and so on.

416Mb is very big you should use liveloader or bulkloader. Using these tools you can get more information. But before using it, remove the “set” and curly “{}”

2 Likes

okay, that gives me some direction. I will go look into bulk loader.

Like @MichelDiz said, you should look into the bulk and live loader. As a general rule, I don’t think users should try to send huge mutations in a single request. Not only are there issues around sending the requests themselves but the chances of a conflict increase as the size of the mutation increases. The live loader, for example, handles these cases by splitting the data in smaller mutation and retrying failed mutations.

3 Likes

Maybe I’m ignorant, but anybody have links to bulk loader or live loader? :slight_smile:

Here are a couple of links to the docs @zicklag :slight_smile:
https://dgraph.io/docs/deploy/#live-loader
https://dgraph.io/docs/deploy/#bulk-loader

1 Like