Error while trying to load bulk rdf data [gzip: invalid header]

sairam · October 1, 2019, 6:11am

Hello all,
I am trying to bulk load data into Dgraph using bulk and live options but I always receive this error.

My Command :

1. dgraph bulk -f 1million.rdf.gz -s 1million.schema --map_shards=4 --reduce_shards=2 --http localhost:8000 --zero=localhost:5080

2. dgraph live -f 1million.rdf.gz

I received the same error in both the cases:

Running transaction with dgraph endpoint: 127.0.0.1:9080

Found 1 data file(s) to process

Processing data file "1million.rdf.gz"

2019/09/30 20:09:45 **gzip: invalid header**

github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/dgo/x.Check

/tmp/go/src/github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/dgo/x/error.go:28

github.com/dgraph-io/dgraph/chunker.FileReader

/tmp/go/src/github.com/dgraph-io/dgraph/chunker/chunk.go:339

github.com/dgraph-io/dgraph/dgraph/cmd/live.(*loader).processFile

/tmp/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/run.go:165

github.com/dgraph-io/dgraph/dgraph/cmd/live.run.func2

/tmp/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/run.go:330

runtime.goexit

/usr/local/go/src/runtime/asm_amd64.s:1337

Version details of my Dgraph:

[Decoder]: Using assembly version of decoder

Dgraph version   : v1.1.0
Dgraph SHA-256   : 98db2956f6dd8b7b9b88e02962d2036845b057fe5fe953190eaafac0a83dfcce
Commit SHA-1     : ef7cdb28
Commit timestamp : 2019-09-04 00:12:51 -0700
Branch           : HEAD
Go version       : go1.12.7

Machine details:

MacBook Pro(2017) v10.14.6

MichelDiz · October 1, 2019, 4:14pm

Can you share a sample of your dataset?
This could be some gz mistake or corrupted.

sairam · October 2, 2019, 7:00am

This is where I got the dataset.( 1million.rdf.gz, 1million.schema )

MichelDiz · October 2, 2019, 3:23pm

well, why are you loading the same dataset twice?

I’ll check this later. If I can reproduce it.

sairam · October 3, 2019, 8:49am

I am not trying to load twice, since one method didn’t work, I tried to load in a different way.

MichelDiz · October 3, 2019, 4:14pm

Sorry Sairam, but I can’t reproduce it in any way.

PS. Tested it using iMac Pro (2017)

1 - Download the 1mi rdf from benchmarks/data/1million.rdf.gz at master · dgraph-io/benchmarks · GitHub
2 - Copy this schema benchmarks/data/1million.schema at master · dgraph-io/benchmarks · GitHub
3 - Download the v1.1.0 binary from releases https://github.com/dgraph-io/dgraph/releases
4 - Create a simple cluster.
5 - Start live loader

result = Works

6 - Delete all files related to that test and star only Zero
7 - Start Bulk loader

result = Works

sairam · October 8, 2019, 9:42am

I don’t know what the problem is, but .gz seems to be the issue here.

dgraph live -f ./1million.rdf.gz -s ./1million.schema -a localhost:9080

Didn’t work(It was throwing the same error as listed above). But…

dgraph live -f ./1million.rdf -s ./1million.schema -a localhost:9080

Works as intended.

MichelDiz · October 8, 2019, 3:50pm

Without being able to reproduce I can’t do anything to help. What OS are you using? can you normally ungzip it? There is a gz compression option, are you using it?

sairam · October 9, 2019, 12:21pm

When I try to unzip usually, it says contents of 1million.rdf.gz cannot be extracted. (both with archive utility and The unarchiver)

But when I downloaded the file directly, Macos by default extracted the file for me. That extracted file was executed properly as I mentioned above.

Shekar · October 9, 2019, 4:09pm

You can try this command on Mac OS:

gzip -d 1million.rdf.gz

and that should give you the uncompressed file.

I just tried it and it worked for me.

Thanks

Shekar

Shekar · October 9, 2019, 4:08pm

You can try this command on Mac OS:

gzip -d

and that should give you the uncompressed file.

I just tried it and it worked for me.

Thanks

Shekar

system · November 8, 2019, 4:11pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Error Log should print the exact line of the RDF/JSON Dgraph dgraph , kind:enhancement , area:bulk-loader	4	790	January 23, 2021
Fail to decompress and load export data Dgraph	2	466	April 22, 2020
Bulkload fails with no error message Dgraph	6	597	May 7, 2020
Schema and mutation RDF triples / bulk loader error handling Dgraph kind:question	4	377	January 13, 2021
How to handle the error throw from dgraph bulk Dgraph	1	472	March 1, 2020

Error while trying to load bulk rdf data [gzip: invalid header]

Related topics