sairam
(sai ram)
October 1, 2019, 6:11am
1
Hello all,
I am trying to bulk load data into Dgraph using bulk and live options but I always receive this error.
My Command :
1. dgraph bulk -f 1million.rdf.gz -s 1million.schema --map_shards=4 --reduce_shards=2 --http localhost:8000 --zero=localhost:5080
2. dgraph live -f 1million.rdf.gz
I received the same error in both the cases:
Running transaction with dgraph endpoint: 127.0.0.1:9080
Found 1 data file(s) to process
Processing data file "1million.rdf.gz"
2019/09/30 20:09:45 **gzip: invalid header**
github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/dgo/x.Check
/tmp/go/src/github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/dgo/x/error.go:28
github.com/dgraph-io/dgraph/chunker.FileReader
/tmp/go/src/github.com/dgraph-io/dgraph/chunker/chunk.go:339
github.com/dgraph-io/dgraph/dgraph/cmd/live.(*loader).processFile
/tmp/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/run.go:165
github.com/dgraph-io/dgraph/dgraph/cmd/live.run.func2
/tmp/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/run.go:330
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1337
Version details of my Dgraph:
[Decoder]: Using assembly version of decoder
Dgraph version : v1.1.0
Dgraph SHA-256 : 98db2956f6dd8b7b9b88e02962d2036845b057fe5fe953190eaafac0a83dfcce
Commit SHA-1 : ef7cdb28
Commit timestamp : 2019-09-04 00:12:51 -0700
Branch : HEAD
Go version : go1.12.7
Machine details:
MacBook Pro(2017) v10.14.6
MichelDiz
(Michel Diz)
October 1, 2019, 4:14pm
2
Can you share a sample of your dataset?
This could be some gz mistake or corrupted.
sairam
(sai ram)
October 2, 2019, 7:00am
3
This is where I got the dataset.( 1million.rdf.gz
, 1million.schema
)
MichelDiz
(Michel Diz)
October 2, 2019, 3:23pm
4
well, why are you loading the same dataset twice?
I’ll check this later. If I can reproduce it.
sairam
(sai ram)
October 3, 2019, 8:49am
5
I am not trying to load twice, since one method didn’t work, I tried to load in a different way.
MichelDiz
(Michel Diz)
October 3, 2019, 4:14pm
6
Sorry Sairam, but I can’t reproduce it in any way.
PS. Tested it using iMac Pro (2017)
1 - Download the 1mi rdf from benchmarks/data/1million.rdf.gz at master · dgraph-io/benchmarks · GitHub
2 - Copy this schema benchmarks/data/1million.schema at master · dgraph-io/benchmarks · GitHub
3 - Download the v1.1.0 binary from releases https://github.com/dgraph-io/dgraph/releases
4 - Create a simple cluster.
5 - Start live loader
result = Works
6 - Delete all files related to that test and star only Zero
7 - Start Bulk loader
result = Works
sairam
(sai ram)
October 8, 2019, 9:42am
7
I don’t know what the problem is, but .gz seems to be the issue here.
dgraph live -f ./1million.rdf.gz -s ./1million.schema -a localhost:9080
Didn’t work (It was throwing the same error as listed above ). But …
dgraph live -f ./1million.rdf -s ./1million.schema -a localhost:9080
Works as intended.
MichelDiz
(Michel Diz)
October 8, 2019, 3:50pm
8
Without being able to reproduce I can’t do anything to help. What OS are you using? can you normally ungzip it? There is a gz compression option, are you using it?
sairam
(sai ram)
October 9, 2019, 12:21pm
9
When I try to unzip usually, it says contents of 1million.rdf.gz cannot be extracted. (both with archive utility and The unarchiver)
But when I downloaded the file directly, Macos by default extracted the file for me. That extracted file was executed properly as I mentioned above.
Shekar
(Shekar Mantha)
October 9, 2019, 4:09pm
10
You can try this command on Mac OS:
gzip -d 1million.rdf.gz
and that should give you the uncompressed file.
I just tried it and it worked for me.
Thanks
Shekar
Shekar
(Shekar Mantha)
October 9, 2019, 4:08pm
11
You can try this command on Mac OS:
gzip -d
and that should give you the uncompressed file.
I just tried it and it worked for me.
Thanks
Shekar
system
(system)
Closed
November 8, 2019, 4:11pm
12
This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.