Importing larger .rdf file with curl

amaster507 · June 24, 2020, 4:14pm

I have generated a import.rdf file based on our data. This file is 416Mb. It is formatted similarly to:

{
  set {
    _:user_1 <dgraph.type> "User" .
    [6,821,357 more lines]
}

I tried to process the file with

curl -H "Content-Type: application/rdf" "localhost:8080/mutate?commitNow=true" -XPOST --data-binary '@path/to/directory/import.rdf'`

I then received the error message:

{"errors":[{"message":"read tcp 172.17.0.2:8080-\u003e172.17.0.1:34272: i/o timeout","extensions":{"code":"ErrorInvalidRequest"}}]}

I am not sure where 172.17.0.1:34272 is coming from as that is not my private ip address. I am guessing that it may be a network setup by Docker on my PC?

I am running dgraph/standalone:v20.03.1 through Docker on Windows 10 and using PowerShell 5.1.19041.1

MichelDiz · June 24, 2020, 5:15pm

Yes, 172.17.0.1 is from Docker Network.

Are you using whitelisting?

Also, “ErrorInvalidRequest” is a CURL error. Might mean that the params are wrong or the file doesn’t exist.

amaster507 · June 24, 2020, 5:18pm

let me simplify it to a smaller mutation in the same directory and then try to get that working.

MichelDiz · June 24, 2020, 5:19pm

Also, Windows uses Backslashes.

amaster507 · June 24, 2020, 5:39pm

I created a simple 6 line test.rdf file:

{
  set {
    _:test_1 <dgraph.type> "Contact" .
    _:test_1 <Contact.firstName> "Test" .
  }
}

curl -H "Content-Type: application/rdf" "localhost:8080/mutate?commitNow=true" -XPOST --data-binary '@.notes/test.rdf'

and it worked successfully:

{"data":{"code":"Success","message":"Done","queries":null,"uids":{"test_1":"0x2713"}},"extensions":{"server_latency":{"parsing_ns":18100,"processing_ns":223770300,"assign_timestamp_ns":927500,"total_ns":224831800},"txn":{"start_ts":10163,"commit_ts":10164,"preds":["1-Contact.firstName","1-dgraph.type"]}}}

I am still getting the same error as before though when trying to process import.rdf in the same file.

MichelDiz · June 24, 2020, 5:54pm

Hum, okay. So maybe there is a parser error. But it isn’t throwing correctly to the user.

When you send the request, your cluster reports bad logs? like crash or “closing connection” and so on.

416Mb is very big you should use liveloader or bulkloader. Using these tools you can get more information. But before using it, remove the “set” and curly “{}”

amaster507 · June 24, 2020, 6:03pm

okay, that gives me some direction. I will go look into bulk loader.

martinmr · June 24, 2020, 6:22pm

Like @MichelDiz said, you should look into the bulk and live loader. As a general rule, I don’t think users should try to send huge mutations in a single request. Not only are there issues around sending the requests themselves but the chances of a conflict increase as the size of the mutation increases. The live loader, for example, handles these cases by splitting the data in smaller mutation and retrying failed mutations.

zicklag · June 27, 2020, 12:01am

Maybe I’m ignorant, but anybody have links to bulk loader or live loader?

dmai · June 27, 2020, 12:04am

Here are a couple of links to the docs @zicklag
https://dgraph.io/docs/deploy/#live-loader
https://dgraph.io/docs/deploy/#bulk-loader

Topic		Replies	Views
Issue with Load data to DB from rdf Dgraph kind:question	6	892	October 8, 2020
"message": "read tcp server_ip:8080->jump_ip:52182: i/o timeout", "extensions": {"code": "ErrorInvalidRequest" Dgraph kind:question , dgraph	3	1288	November 5, 2020
Can't load a bigger dataset: https://dgraph.io/tour/moredata/1/# Dgraph dgraph , help-wanted , tutorial	6	652	August 3, 2020
How does Dgraph import files other than RDF files, such as json or CSV？ Dgraph	5	2486	September 11, 2018
Import a 2GB json file into dgraph, and the corresponding node will die Dgraph kind:question	4	391	August 28, 2020

Importing larger .rdf file with curl

Related topics