Moved from GitHub dgraph/3375
Posted by vipulmathur:
Filing this using the bug template, but this can be considered an enhancement as well.
Bulk load crashes on encountering an error (unexpected EOF) in an input file. Instead much better behavior would be to log this as an error, and continue with other input files (there were about 4000 input .rdf.gz
files in this case). The files which encountered an error could have been live-loaded later. Instead, bulk load stops (in this case after running for 12+ hours) without usable output.
Hoping the current behavior of bulk load on input file errors can be changed from ‘abort bulk load’ to ‘note error, continue with other inputs’. This behavior could even be configurable with an input flag to the bulk loader if needed.
Also, note in the snippet below, the name of the errant input file is not printed. That should definitely be printed along with the error, since it would save time in identifying which of the (4000+ in the case) input files is corrupted.
MAP 12h00m11s nquad_count:4.700G err_count:0.000 nquad_speed:108.8k/sec edge_count:38.13G edge_speed:882.4k/sec
2019/05/04 05:10:18 unexpected EOF
github.com/dgraph-io/dgraph/x.Wrap
/ext-go/1/src/github.com/dgraph-io/dgraph/x/error.go:91
github.com/dgraph-io/dgraph/x.Check
/ext-go/1/src/github.com/dgraph-io/dgraph/x/error.go:41
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*loader).mapStage.func2
/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/loader.go:242
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1333
- What version of Dgraph are you using?
$ dgraph version
Dgraph version : v1.0.14
Commit SHA-1 : 26cb2f94
Commit timestamp : 2019-04-12 13:21:56 -0700
Branch : HEAD
Go version : go1.11.5
-
Have you tried reproducing the issue with latest release?
- Yes. Using the latest release
v1.0.14
- Yes. Using the latest release
-
What is the hardware spec (RAM, OS)?
- AWS EC2 m5.metal instance
- CPU: Intel® Xeon® Platinum 8175M, 2 sockets, 24 cores per socket, 2 threads per core. So 96 threads, 48 cores.
- RAM: 384GB RAM
- OS: Ubuntu 18.04.2 LTS
-
Steps to reproduce the issue (command/config used to run Dgraph).
- Start bulk loader with multiple input
.rdf.gz
files in a directory, one of the inputs file is a corrupted (truncated)gzip
.
- Start bulk loader with multiple input
-
Expected behaviour and actual result.
- Expected: bulk loader completes with all the valid input files, logs an error message for the corrupted input file (identifying the specific file). At the very least, it should be possible to resume the bulk load (after the corrupted input file has been fixed) from the point where it aborted.
- Actual: bulk loader aborts the load in the map phase itself, thus wasting the time and output data created until the corrupted input file was encountered.