Error while trying to load bulk rdf data [gzip: invalid header]


(sai ram) #1

Hello all,
I am trying to bulk load data into Dgraph using bulk and live options but I always receive this error.

My Command :

1. dgraph bulk -f 1million.rdf.gz -s 1million.schema --map_shards=4 --reduce_shards=2 --http localhost:8000 --zero=localhost:5080

2. dgraph live -f 1million.rdf.gz

I received the same error in both the cases:

Running transaction with dgraph endpoint: 127.0.0.1:9080

Found 1 data file(s) to process

Processing data file "1million.rdf.gz"

2019/09/30 20:09:45 **gzip: invalid header**

github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/dgo/x.Check

/tmp/go/src/github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/dgo/x/error.go:28

github.com/dgraph-io/dgraph/chunker.FileReader

/tmp/go/src/github.com/dgraph-io/dgraph/chunker/chunk.go:339

github.com/dgraph-io/dgraph/dgraph/cmd/live.(*loader).processFile

/tmp/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/run.go:165

github.com/dgraph-io/dgraph/dgraph/cmd/live.run.func2

/tmp/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/run.go:330

runtime.goexit

/usr/local/go/src/runtime/asm_amd64.s:1337

Version details of my Dgraph:

[Decoder]: Using assembly version of decoder

Dgraph version   : v1.1.0
Dgraph SHA-256   : 98db2956f6dd8b7b9b88e02962d2036845b057fe5fe953190eaafac0a83dfcce
Commit SHA-1     : ef7cdb28
Commit timestamp : 2019-09-04 00:12:51 -0700
Branch           : HEAD
Go version       : go1.12.7

Machine details:

MacBook Pro(2017) v10.14.6


(Michel Conrado) #2

Can you share a sample of your dataset?
This could be some gz mistake or corrupted.


(sai ram) #3

This is where I got the dataset.( 1million.rdf.gz, 1million.schema )


(Michel Conrado) #4

well, why are you loading the same dataset twice?

I’ll check this later. If I can reproduce it.


(sai ram) #5

I am not trying to load twice, since one method didn’t work, I tried to load in a different way.


(Michel Conrado) #6

Sorry Sairam, but I can’t reproduce it in any way.

PS. Tested it using iMac Pro (2017)

1 - Download the 1mi rdf from https://github.com/dgraph-io/benchmarks/blob/master/data/1million.rdf.gz
2 - Copy this schema https://github.com/dgraph-io/benchmarks/blob/master/data/1million.schema
3 - Download the v1.1.0 binary from releases https://github.com/dgraph-io/dgraph/releases
4 - Create a simple cluster.
5 - Start live loader

result = Works

6 - Delete all files related to that test and star only Zero
7 - Start Bulk loader

result = Works


(sai ram) #7

I don’t know what the problem is, but .gz seems to be the issue here.

dgraph live -f ./1million.rdf.gz -s ./1million.schema -a localhost:9080

Didn’t work(It was throwing the same error as listed above). But

dgraph live -f ./1million.rdf -s ./1million.schema -a localhost:9080

Works as intended.


(Michel Conrado) #8

Without being able to reproduce I can’t do anything to help. What OS are you using? can you normally ungzip it? There is a gz compression option, are you using it?


(sai ram) #9

When I try to unzip usually, it says contents of 1million.rdf.gz cannot be extracted. (both with archive utility and The unarchiver)

But when I downloaded the file directly, Macos by default extracted the file for me. That extracted file was executed properly as I mentioned above.


(Shekar Mantha) #10

You can try this command on Mac OS:

gzip -d 1million.rdf.gz

and that should give you the uncompressed file.

I just tried it and it worked for me.

Thanks

Shekar


(Shekar Mantha) #11

You can try this command on Mac OS:

gzip -d

and that should give you the uncompressed file.

I just tried it and it worked for me.

Thanks

Shekar