Bulk loader should indicate an incomplete import explicitly in logs

Moved from GitHub dgraph/4395

Posted by mattysan:

Observed:
When importing some RDF data sets given to us by a third-party we have encountered a few schema issues whereby the bulk loader (which we are running in a Docker container) reports the (legit) error and fails. It was not immediately obvious simply by looking at the log whether or not the process stopped there or continued; we tend to let these imports run overnight and the first thing we do is view the log in the morning.

Expected result:
The bulk loader should explicitly state in the log whether or not it completed processing the data set; especially where in the data set it stopped. Some kind of “Exiting” message would be the least optimal solution but still better than nothing.

ashish-goswami commented :

Hey @mattysan, can you provide us some instances when bulk loader is not very clear about completion. At every second it prints the progress of data insertion like stage(map or reduce) and
how much progress it has made.

mattysan commented :

@ashish-goswami Before I do that, can you tell me what it prints out upon completion of a successful import? And what does it print out upon completion of an unsuccessful import? If you can’t answer those two questions, then you have a bit of work to do, rather than asking me to give you examples to delay fixing an obvious issue. The issue at hand is simple: If I want to monitor a log to tell when an import is done or when it has failed, then what do I look for? It failing is just as important to know as it succeeding, so that I could make some deployment automation decisions (example: roll back upon unsuccessful import).

ashish-goswami commented :

Hey @mattysan, I get your use case now. Bulkloader prints its progress every second. for example look at its output on 21M dataset run. It prints that phase is reduce and %age of completion in every line. 100% in last line.

[14:24:52+0530] REDUCE 02m02s 27.64% edge_count:28.37M edge_speed:1.418M/sec plist_count:4.138M plist_speed:206.8k/sec. Num Encoding: 11
[14:24:53+0530] REDUCE 02m03s 31.75% edge_count:32.59M edge_speed:1.551M/sec plist_count:4.149M plist_speed:197.5k/sec. Num Encoding: 1
[14:24:54+0530] REDUCE 02m04s 34.77% edge_count:35.69M edge_speed:1.621M/sec plist_count:4.169M plist_speed:189.4k/sec. Num Encoding: 1
[14:24:55+0530] REDUCE 02m05s 41.11% edge_count:42.20M edge_speed:1.834M/sec plist_count:4.174M plist_speed:181.4k/sec. Num Encoding: 7
[14:24:56+0530] REDUCE 02m06s 48.79% edge_count:50.08M edge_speed:2.086M/sec plist_count:4.181M plist_speed:174.1k/sec. Num Encoding: 11
[14:24:57+0530] REDUCE 02m07s 56.64% edge_count:58.14M edge_speed:2.325M/sec plist_count:4.189M plist_speed:167.5k/sec. Num Encoding: 13
[14:24:58+0530] REDUCE 02m08s 62.85% edge_count:64.51M edge_speed:2.480M/sec plist_count:4.218M plist_speed:162.2k/sec. Num Encoding: 13
[14:24:59+0530] REDUCE 02m09s 70.08% edge_count:71.93M edge_speed:2.662M/sec plist_count:4.265M plist_speed:157.9k/sec. Num Encoding: 9
[14:25:00+0530] REDUCE 02m10s 73.11% edge_count:75.04M edge_speed:2.677M/sec plist_count:4.601M plist_speed:164.1k/sec. Num Encoding: 3
[14:25:01+0530] REDUCE 02m11s 73.47% edge_count:75.41M edge_speed:2.597M/sec plist_count:4.803M plist_speed:165.4k/sec. Num Encoding: 2
[14:25:02+0530] REDUCE 02m12s 76.80% edge_count:78.83M edge_speed:2.625M/sec plist_count:5.749M plist_speed:191.4k/sec. Num Encoding: 2
[14:25:03+0530] REDUCE 02m13s 78.42% edge_count:80.49M edge_speed:2.594M/sec plist_count:6.728M plist_speed:216.8k/sec. Num Encoding: 0
[14:25:04+0530] REDUCE 02m14s 78.84% edge_count:80.92M edge_speed:2.526M/sec plist_count:6.875M plist_speed:214.6k/sec. Num Encoding: 5
[14:25:05+0530] REDUCE 02m15s 79.83% edge_count:81.94M edge_speed:2.479M/sec plist_count:7.446M plist_speed:225.2k/sec. Num Encoding: 4
[14:25:06+0530] REDUCE 02m16s 83.71% edge_count:85.92M edge_speed:2.523M/sec plist_count:9.166M plist_speed:269.1k/sec. Num Encoding: 3
[14:25:07+0530] REDUCE 02m17s 85.28% edge_count:87.53M edge_speed:2.497M/sec plist_count:9.870M plist_speed:281.5k/sec. Num Encoding: 8
[14:25:08+0530] REDUCE 02m18s 87.21% edge_count:89.52M edge_speed:2.482M/sec plist_count:10.71M plist_speed:297.0k/sec. Num Encoding: 4
[14:25:09+0530] REDUCE 02m19s 90.85% edge_count:93.26M edge_speed:2.516M/sec plist_count:11.68M plist_speed:315.1k/sec. Num Encoding: 3
[14:25:10+0530] REDUCE 02m20s 91.54% edge_count:93.96M edge_speed:2.469M/sec plist_count:12.35M plist_speed:324.4k/sec. Num Encoding: 0
[14:25:11+0530] REDUCE 02m21s 92.42% edge_count:94.86M edge_speed:2.429M/sec plist_count:13.16M plist_speed:336.9k/sec. Num Encoding: 15
[14:25:12+0530] REDUCE 02m22s 95.30% edge_count:97.82M edge_speed:2.442M/sec plist_count:15.32M plist_speed:382.5k/sec. Num Encoding: 4
[14:25:13+0530] REDUCE 02m23s 96.83% edge_count:99.39M edge_speed:2.421M/sec plist_count:16.56M plist_speed:403.3k/sec. Num Encoding: 3
[14:25:14+0530] REDUCE 02m24s 98.02% edge_count:100.6M edge_speed:2.392M/sec plist_count:17.44M plist_speed:414.5k/sec. Num Encoding: 2
[14:25:15+0530] REDUCE 02m25s 98.84% edge_count:101.5M edge_speed:2.356M/sec plist_count:18.23M plist_speed:423.3k/sec. Num Encoding: 1
[14:25:16+0530] REDUCE 02m26s 100.00% edge_count:102.6M edge_speed:2.330M/sec plist_count:18.74M plist_speed:425.2k/sec. Num Encoding: 0
[14:25:17+0530] REDUCE 02m27s 100.00% edge_count:102.6M edge_speed:2.278M/sec plist_count:18.74M plist_speed:415.8k/sec. Num Encoding: 0
[14:25:17+0530] REDUCE 02m27s 100.00% edge_count:102.6M edge_speed:2.258M/sec plist_count:18.74M plist_speed:412.1k/sec. Num Encoding: 0
Total: 02m27s

Similarly in case of error, it prints stacktrace with error. For example in case of incorrect json it prints following:

2020/03/07 02:59:41 Malformed JSON
github.com/dgraph-io/dgraph/chunker.(*jsonChunker).consumeMap
	/tmp/go/src/github.com/dgraph-io/dgraph/chunker/chunk.go:256
github.com/dgraph-io/dgraph/chunker.(*jsonChunker).Chunk
	/tmp/go/src/github.com/dgraph-io/dgraph/chunker/chunk.go:194
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*loader).mapStage.func2
	/tmp/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/loader.go:200
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1357
github.com/dgraph-io/dgraph/x.Check
	/tmp/go/src/github.com/dgraph-io/dgraph/x/error.go:42
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*loader).mapStage.func2
	/tmp/go/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/loader.go:207
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1357

However I agree with you, we can make it more clear to indicate success/failure. This can be helpful for writing any wrapper over bulkloader.