I didn’t touch my machine as it does all the profiling. I did four runs in total (mem/cpu) x (bulk load or not).
It seems to me that the differences are within noise levels and I don’t think we should try to conclude anything from it, except that RocksDB’s bulkload has little impact on dgraphloader. In fact, runtime.cgocall seems to take less time without bulkload than with bulkload…
Previously, I also did some runs with assigner and loader, and it seems that bulkload actually slows down everything significantly. I’m not sure why that didn’t show up in the profiling of only the loader.
Unless anyone objects @minions, my take is not to use bulk load and there is no need to submit / review the PR.
Quick update: No significant difference in running time when I try max_background_compact=20, 10, 1 (default).
And no significant difference between jemalloc and stdmalloc. Note that jemalloc is the default malloc for embedded build. To disable it, you do go build -tags="embed stdmalloc". I run the two different loader binaries (with different file sizes) and see no significant difference in running time.
Note: On my machine running time can fluctuate quite a bit. It varies between 7.5min+ to 8min+ for loading only rdf-films.gz. It is hard to draw any conclusion if the improvement is small. Perhaps, I should run some benchmarks instead.