OOM during bulk loading

Dgraph version: v21.03.1
I am trying to load 10_000_000 records (1.1G) in RDF format. These records i took from Wikidata.

Records example:

<https://www.wikidata.org/wiki/Special:EntityData/L25> <http://schema.org/about> <http://www.wikidata.org/entity/L25> .
<https://www.wikidata.org/wiki/Special:EntityData/L25> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Dataset> .
<http://www.wikidata.org/entity/L314-F1> <http://wikiba.se/ontology#grammaticalFeature> <http://www.wikidata.org/entity/Q146786> .
<http://www.wikidata.org/entity/L314-F1> <http://www.w3.org/ns/lemon/ontolex#representation> "pis"@ca .
<http://www.wikidata.org/entity/L314-F1> <http://www.w3.org/2000/01/rdf-schema#label> "pis"@ca .

Schema example:

<http://www.w3.org/ns/lemon/ontolex#representation>: string @lang .
<http://www.w3.org/2000/01/rdf-schema#label>: string @lang .
<http://www.wikidata.org/prop/direct/P5831>: string @lang .
<http://wikiba.se/ontology#lemma>: string @lang .
<http://www.wikidata.org/prop/statement/P5831>: string @lang .
<http://www.w3.org/2004/02/skos/core#definition>: string @lang .
<http://www.wikidata.org/prop/direct/P5187>: string @lang .
<http://www.wikidata.org/prop/statement/P5187>: string @lang .

Cluster setup is:

Machine is configured with 750G of RAM which should be enough to load 1.1g.

I executed command:

dgraph bulk -s schema.txt -f records.nq --map_shards=1 --zero=localhost:5000 --format=rdf --num_go_routines=2

Then bulk load failed with oom in reduce stage:

[10:01:42Z] REDUCE 40s 31.24% edge_count:3.118M edge_speed:623.6k/sec plist_count:206.7k plist_speed:41.34k/sec. Num Encoding MBs: 1053. jemalloc: 3.0 GiB
Finishing stream id: 1
Finishing stream id: 3
Finishing stream id: 4
[10:01:43Z] REDUCE 41s 42.43% edge_count:4.235M edge_speed:705.8k/sec plist_count:732.4k plist_speed:122.1k/sec. Num Encoding MBs: 796. jemalloc: 3.7 GiB
badger 2021/10/05 10:01:44 INFO: Table created: 3 at level: 6 for stream: 3. Size: 998 KiB
badger 2021/10/05 10:01:44 INFO: Table created: 4 at level: 6 for stream: 4. Size: 1.1 MiB
badger 2021/10/05 10:01:44 INFO: Table created: 2 at level: 6 for stream: 1. Size: 9.2 MiB
badger 2021/10/05 10:01:44 INFO: Table created: 1 at level: 6 for stream: 1. Size: 60 MiB
[10:01:44Z] REDUCE 42s 49.09% edge_count:4.900M edge_speed:700.0k/sec plist_count:1.277M plist_speed:182.4k/sec. Num Encoding MBs: 796. jemalloc: 3.4 GiB
[10:01:45Z] REDUCE 43s 58.88% edge_count:5.877M edge_speed:734.5k/sec plist_count:1.644M plist_speed:205.5k/sec. Num Encoding MBs: 539. jemalloc: 3.0 GiB
[10:01:46Z] REDUCE 44s 71.96% edge_count:7.182M edge_speed:798.1k/sec plist_count:1.682M plist_speed:186.9k/sec. Num Encoding MBs: 539. jemalloc: 209 GiB
[10:01:47Z] REDUCE 45s 81.21% edge_count:8.105M edge_speed:810.6k/sec plist_count:1.907M plist_speed:190.7k/sec. Num Encoding MBs: 283. jemalloc: 478 GiB
fatal error: out of memory

Full logs:

root@b9516565fe70:/dgraph2/nq_data# dgraph bulk -s /dgraph2/nq_schema/schema_not_full -f /dgraph2/nq_data/rdf.nq1_10000000 --map_shards=1 --reduce_shards=1 --http localhost:8000 --zero=localhost:5080 --format=rdf --ignore_errors --num_go_routines=2
I1005 10:01:02.691884     124 init.go:110]

Dgraph version   : v21.03.1
Dgraph codename  : rocket-1
Dgraph SHA-256   : a00b73d583a720aa787171e43b4cb4dbbf75b38e522f66c9943ab2f0263007fe
Commit SHA-1     : ea1cb5f35
Commit timestamp : 2021-06-17 20:38:11 +0530
Branch           : HEAD
Go version       : go1.16.2
jemalloc enabled : true

For Dgraph official documentation, visit https://urldefense.com/v3/__https://dgraph.io/docs__;!!GF_29dbcQIUBPA!hCN-bHaL-JkBMOzORniYpTBJ_71VIUUBRR3fFYWlufILryFQeG6emc1pUg_L3OdVYNCTtA$ [dgraph[.]io].
For discussions about Dgraph     , visit https://urldefense.com/v3/__https://discuss.dgraph.io__;!!GF_29dbcQIUBPA!hCN-bHaL-JkBMOzORniYpTBJ_71VIUUBRR3fFYWlufILryFQeG6emc1pUg_L3OcrPZKB4w$ [discuss[.]dgraph[.]io].
For fully-managed Dgraph Cloud   , visit https://urldefense.com/v3/__https://dgraph.io/cloud__;!!GF_29dbcQIUBPA!hCN-bHaL-JkBMOzORniYpTBJ_71VIUUBRR3fFYWlufILryFQeG6emc1pUg_L3OcgzTAfBw$ [dgraph[.]io].

Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2021 Dgraph Labs, Inc.


Encrypted input: false; Encrypted output: false
{
        "DataFiles": "/dgraph2/nq_data/rdf.nq1_10000000",
        "DataFormat": "rdf",
        "SchemaFile": "/dgraph2/nq_schema/schema_not_full",
        "GqlSchemaFile": "",
        "OutDir": "./out",
        "ReplaceOutDir": false,
        "TmpDir": "tmp",
        "NumGoroutines": 2,
        "MapBufSize": 2147483648,
        "PartitionBufSize": 4194304,
        "SkipMapPhase": false,
        "CleanupTmp": true,
        "NumReducers": 1,
        "Version": false,
        "StoreXids": false,
        "ZeroAddr": "localhost:5080",
        "HttpAddr": "localhost:8000",
        "IgnoreErrors": true,
        "CustomTokenizers": "",
        "NewUids": false,
        "ClientDir": "",
        "Encrypted": false,
        "EncryptedOut": false,
        "MapShards": 1,
        "ReduceShards": 1,
        "Namespace": 18446744073709551615,
        "EncryptionKey": null,
        "Badger": {
                "Dir": "",
                "ValueDir": "",
                "SyncWrites": false,
                "NumVersionsToKeep": 1,
                "ReadOnly": false,
                "Logger": {},
                "Compression": 1,
                "InMemory": false,
                "MetricsEnabled": true,
                "NumGoroutines": 8,
                "MemTableSize": 67108864,
                "BaseTableSize": 2097152,
                "BaseLevelSize": 10485760,
                "LevelSizeMultiplier": 10,
                "TableSizeMultiplier": 2,
                "MaxLevels": 7,
                "VLogPercentile": 0,
                "ValueThreshold": 1048576,
                "NumMemtables": 5,
                "BlockSize": 4096,
                "BloomFalsePositive": 0.01,
                "BlockCacheSize": 20132659,
                "IndexCacheSize": 46976204,
                "NumLevelZeroTables": 5,
                "NumLevelZeroTablesStall": 15,
                "ValueLogFileSize": 1073741823,
                "ValueLogMaxEntries": 1000000,
                "NumCompactors": 4,
                "CompactL0OnClose": false,
                "LmaxCompaction": false,
                "ZSTDCompressionLevel": 0,
                "VerifyValueChecksum": false,
                "EncryptionKey": "",
                "EncryptionKeyRotationDuration": 864000000000000,
                "BypassLockGuard": false,
                "ChecksumVerificationMode": 0,
                "DetectConflicts": true,
                "NamespaceOffset": -1
        }
}
Connecting to zero at localhost:5080
___ Begin jemalloc statistics ___
Version: "5.2.1-0-gea6b3e973b477b8061e0076bb257dbd7f3faa756"
Build-time option settings
  config.cache_oblivious: true
  config.debug: false
  config.fill: true
  config.lazy_lock: false
  config.malloc_conf: "background_thread:true,metadata_thp:auto"
  config.opt_safety_checks: false
  config.prof: true
  config.prof_libgcc: true
  config.prof_libunwind: false
  config.stats: true
  config.utrace: false
  config.xmalloc: false
Run-time option settings
  opt.abort: false
  opt.abort_conf: false
  opt.confirm_conf: false
  opt.retain: true
  opt.dss: "secondary"
  opt.narenas: 320
  opt.percpu_arena: "disabled"
  opt.oversize_threshold: 8388608
  opt.metadata_thp: "auto"
  opt.background_thread: true (background_thread: true)
  opt.dirty_decay_ms: 10000 (arenas.dirty_decay_ms: 10000)
  opt.muzzy_decay_ms: 0 (arenas.muzzy_decay_ms: 0)
  opt.lg_extent_max_active_fit: 6
  opt.junk: "false"
  opt.zero: false
  opt.tcache: true
  opt.lg_tcache_max: 15
  opt.thp: "default"
  opt.prof: false
  opt.prof_prefix: "jeprof"
  opt.prof_active: true (prof.active: false)
  opt.prof_thread_active_init: true (prof.thread_active_init: false)
  opt.lg_prof_sample: 19 (prof.lg_sample: 0)
  opt.prof_accum: false
  opt.lg_prof_interval: -1
  opt.prof_gdump: false
  opt.prof_final: false
  opt.prof_leak: false
  opt.stats_print: false
  opt.stats_print_opts: ""
Profiling settings
  prof.thread_active_init: false
  prof.active: false
  prof.gdump: false
  prof.interval: 0
  prof.lg_sample: 0
Arenas: 321
Quantum size: 16
Page size: 4096
Maximum thread-cached size class: 32768
Number of bin size classes: 36
Number of thread-cache bin size classes: 41
Number of large size classes: 196
Allocated: 4255528, active: 4284416, metadata: 15500856 (n_thp 0), resident: 19734528, mapped: 27353088, retained: 7774208
Background threads: 2, num_runs: 4, run_interval: 147330000 ns
--- End jemalloc statistics ---
Processing file (1 out of 1): /dgraph2/nq_data/rdf.nq1_10000000
[10:01:03Z] MAP 01s nquad_count:504.6k err_count:0.000 nquad_speed:503.0k/sec edge_count:504.6k edge_speed:502.9k/sec jemalloc: 96 MiB
[10:01:04Z] MAP 02s nquad_count:1.060M err_count:0.000 nquad_speed:529.2k/sec edge_count:1.060M edge_speed:529.2k/sec jemalloc: 96 MiB
[10:01:05Z] MAP 03s nquad_count:1.616M err_count:0.000 nquad_speed:538.1k/sec edge_count:1.616M edge_speed:538.1k/sec jemalloc: 96 MiB
[10:01:06Z] MAP 04s nquad_count:2.184M err_count:0.000 nquad_speed:545.7k/sec edge_count:2.184M edge_speed:545.7k/sec jemalloc: 96 MiB
[10:01:07Z] MAP 05s nquad_count:2.743M err_count:0.000 nquad_speed:548.2k/sec edge_count:2.743M edge_speed:548.2k/sec jemalloc: 96 MiB
[10:01:08Z] MAP 06s nquad_count:3.323M err_count:0.000 nquad_speed:553.5k/sec edge_count:3.323M edge_speed:553.5k/sec jemalloc: 96 MiB
[10:01:09Z] MAP 07s nquad_count:3.850M err_count:0.000 nquad_speed:549.7k/sec edge_count:3.850M edge_speed:549.7k/sec jemalloc: 96 MiB
[10:01:10Z] MAP 08s nquad_count:4.421M err_count:0.000 nquad_speed:552.4k/sec edge_count:4.421M edge_speed:552.4k/sec jemalloc: 96 MiB
[10:01:11Z] MAP 09s nquad_count:4.922M err_count:0.000 nquad_speed:546.7k/sec edge_count:4.922M edge_speed:546.7k/sec jemalloc: 96 MiB
[10:01:12Z] MAP 10s nquad_count:5.459M err_count:0.000 nquad_speed:545.7k/sec edge_count:5.459M edge_speed:545.7k/sec jemalloc: 96 MiB
[10:01:13Z] MAP 11s nquad_count:6.006M err_count:0.000 nquad_speed:545.8k/sec edge_count:6.006M edge_speed:545.8k/sec jemalloc: 96 MiB
[10:01:14Z] MAP 12s nquad_count:6.579M err_count:0.000 nquad_speed:548.1k/sec edge_count:6.579M edge_speed:548.1k/sec jemalloc: 96 MiB
[10:01:15Z] MAP 13s nquad_count:7.147M err_count:0.000 nquad_speed:549.6k/sec edge_count:7.147M edge_speed:549.6k/sec jemalloc: 96 MiB
[10:01:16Z] MAP 14s nquad_count:7.728M err_count:0.000 nquad_speed:551.9k/sec edge_count:7.728M edge_speed:551.9k/sec jemalloc: 96 MiB
[10:01:17Z] MAP 15s nquad_count:8.311M err_count:0.000 nquad_speed:553.9k/sec edge_count:8.311M edge_speed:553.9k/sec jemalloc: 96 MiB
[10:01:18Z] MAP 16s nquad_count:8.863M err_count:0.000 nquad_speed:553.8k/sec edge_count:8.863M edge_speed:553.8k/sec jemalloc: 96 MiB
[10:01:19Z] MAP 17s nquad_count:9.448M err_count:0.000 nquad_speed:555.6k/sec edge_count:9.448M edge_speed:555.6k/sec jemalloc: 96 MiB
[10:01:20Z] MAP 18s nquad_count:9.981M err_count:0.000 nquad_speed:554.4k/sec edge_count:9.981M edge_speed:554.4k/sec jemalloc: 96 MiB
[10:01:21Z] MAP 19s nquad_count:9.981M err_count:0.000 nquad_speed:525.2k/sec edge_count:9.981M edge_speed:525.2k/sec jemalloc: 675 MiB
[10:01:22Z] MAP 20s nquad_count:9.981M err_count:0.000 nquad_speed:499.0k/sec edge_count:9.981M edge_speed:499.0k/sec jemalloc: 675 MiB
[10:01:23Z] MAP 21s nquad_count:9.981M err_count:0.000 nquad_speed:475.2k/sec edge_count:9.981M edge_speed:475.2k/sec jemalloc: 675 MiB
[10:01:24Z] MAP 22s nquad_count:9.981M err_count:0.000 nquad_speed:453.6k/sec edge_count:9.981M edge_speed:453.6k/sec jemalloc: 675 MiB
[10:01:25Z] MAP 23s nquad_count:9.981M err_count:0.000 nquad_speed:433.9k/sec edge_count:9.981M edge_speed:433.9k/sec jemalloc: 675 MiB
[10:01:26Z] MAP 24s nquad_count:9.981M err_count:0.000 nquad_speed:415.8k/sec edge_count:9.981M edge_speed:415.8k/sec jemalloc: 675 MiB
[10:01:27Z] MAP 25s nquad_count:9.981M err_count:0.000 nquad_speed:399.2k/sec edge_count:9.981M edge_speed:399.2k/sec jemalloc: 96 MiB
[10:01:28Z] MAP 26s nquad_count:9.981M err_count:0.000 nquad_speed:383.8k/sec edge_count:9.981M edge_speed:383.8k/sec jemalloc: 96 MiB
[10:01:29Z] MAP 27s nquad_count:9.981M err_count:0.000 nquad_speed:369.6k/sec edge_count:9.981M edge_speed:369.6k/sec jemalloc: 96 MiB
[10:01:30Z] MAP 28s nquad_count:9.981M err_count:0.000 nquad_speed:356.4k/sec edge_count:9.981M edge_speed:356.4k/sec jemalloc: 96 MiB
[10:01:31Z] MAP 29s nquad_count:9.981M err_count:0.000 nquad_speed:344.1k/sec edge_count:9.981M edge_speed:344.1k/sec jemalloc: 96 MiB
[10:01:32Z] MAP 30s nquad_count:9.981M err_count:0.000 nquad_speed:332.7k/sec edge_count:9.981M edge_speed:332.7k/sec jemalloc: 96 MiB
[10:01:33Z] MAP 31s nquad_count:9.981M err_count:0.000 nquad_speed:321.9k/sec edge_count:9.981M edge_speed:321.9k/sec jemalloc: 96 MiB
[10:01:34Z] MAP 32s nquad_count:9.981M err_count:0.000 nquad_speed:311.9k/sec edge_count:9.981M edge_speed:311.9k/sec jemalloc: 96 MiB
[10:01:35Z] MAP 33s nquad_count:9.981M err_count:0.000 nquad_speed:302.4k/sec edge_count:9.981M edge_speed:302.4k/sec jemalloc: 96 MiB
[10:01:36Z] MAP 34s nquad_count:9.981M err_count:0.000 nquad_speed:293.5k/sec edge_count:9.981M edge_speed:293.5k/sec jemalloc: 96 MiB
Shard tmp/map_output/000 -> Reduce tmp/shards/shard_0/000
[10:01:37Z] REDUCE 35s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding MBs: 0. jemalloc: 0 B
badger 2021/10/05 10:01:37 INFO: All 0 tables opened in 0s
badger 2021/10/05 10:01:37 INFO: Discard stats nextEmptySlot: 0
badger 2021/10/05 10:01:37 INFO: Set nextTxnTs to 0
badger 2021/10/05 10:01:37 INFO: All 0 tables opened in 0s
badger 2021/10/05 10:01:37 INFO: Discard stats nextEmptySlot: 0
badger 2021/10/05 10:01:37 INFO: Set nextTxnTs to 0
badger 2021/10/05 10:01:37 INFO: DropAll called. Blocking writes...
badger 2021/10/05 10:01:37 INFO: Writes flushed. Stopping compactions now...
badger 2021/10/05 10:01:37 INFO: Deleted 0 SSTables. Now deleting value logs...
badger 2021/10/05 10:01:37 INFO: Value logs deleted. Creating value log file: 1
badger 2021/10/05 10:01:37 INFO: Deleted 1 value log files. DropAll done.
Num Encoders: 2
[10:01:38Z] REDUCE 36s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding MBs: 0. jemalloc: 320 MiB
[10:01:39Z] REDUCE 37s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding MBs: 256. jemalloc: 1.5 GiB
[10:01:40Z] REDUCE 38s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding MBs: 513. jemalloc: 2.2 GiB
[10:01:41Z] REDUCE 39s 9.64% edge_count:962.3k edge_speed:240.6k/sec plist_count:27.93k plist_speed:6.981k/sec. Num Encoding MBs: 769. jemalloc: 2.9 GiB
Final Histogram of buffer sizes:
 -- Histogram:
Min value: 29229874
Max value: 269292012
Count: 5
50p: 536870912.00
75p: 536870912.00
90p: 536870912.00
[16777216, 33554432) 1 20.00% 20.00%
[268435456, 536870912) 4 80.00% 100.00%
 --

[10:01:42Z] REDUCE 40s 31.24% edge_count:3.118M edge_speed:623.6k/sec plist_count:206.7k plist_speed:41.34k/sec. Num Encoding MBs: 1053. jemalloc: 3.0 GiB
Finishing stream id: 1
Finishing stream id: 3
Finishing stream id: 4
[10:01:43Z] REDUCE 41s 42.43% edge_count:4.235M edge_speed:705.8k/sec plist_count:732.4k plist_speed:122.1k/sec. Num Encoding MBs: 796. jemalloc: 3.7 GiB
badger 2021/10/05 10:01:44 INFO: Table created: 3 at level: 6 for stream: 3. Size: 998 KiB
badger 2021/10/05 10:01:44 INFO: Table created: 4 at level: 6 for stream: 4. Size: 1.1 MiB
badger 2021/10/05 10:01:44 INFO: Table created: 2 at level: 6 for stream: 1. Size: 9.2 MiB
badger 2021/10/05 10:01:44 INFO: Table created: 1 at level: 6 for stream: 1. Size: 60 MiB
[10:01:44Z] REDUCE 42s 49.09% edge_count:4.900M edge_speed:700.0k/sec plist_count:1.277M plist_speed:182.4k/sec. Num Encoding MBs: 796. jemalloc: 3.4 GiB
[10:01:45Z] REDUCE 43s 58.88% edge_count:5.877M edge_speed:734.5k/sec plist_count:1.644M plist_speed:205.5k/sec. Num Encoding MBs: 539. jemalloc: 3.0 GiB
[10:01:46Z] REDUCE 44s 71.96% edge_count:7.182M edge_speed:798.1k/sec plist_count:1.682M plist_speed:186.9k/sec. Num Encoding MBs: 539. jemalloc: 209 GiB
[10:01:47Z] REDUCE 45s 81.21% edge_count:8.105M edge_speed:810.6k/sec plist_count:1.907M plist_speed:190.7k/sec. Num Encoding MBs: 283. jemalloc: 478 GiB
fatal error: out of memory

goroutine 222 [running]:
runtime.throw(0x20e9612, 0xd)
        /usr/local/go/src/runtime/panic.go:1117 +0x72 fp=0xc001080918 sp=0xc0010808e8 pc=0xa6fa32
github.com/dgraph-io/ristretto/z.Calloc(0x8000000, 0x20e79e7, 0xc, 0xc0bd0de000, 0x7fab39578658, 0x58)
        /go/pkg/mod/github.com/dgraph-io/ristretto@v0.0.4-0.20210504190834-0bf2acd73aa3/z/calloc_jemalloc.go:78 +0x1c7 fp=0xc001080958 sp=0xc001080918 pc=0xec7ae7
github.com/dgraph-io/ristretto/z.NewAllocator(0x8000000, 0x20e79e7, 0xc, 0x1)
        /go/pkg/mod/github.com/dgraph-io/ristretto@v0.0.4-0.20210504190834-0bf2acd73aa3/z/allocator.go:85 +0x18d fp=0xc0010809c8 sp=0xc001080958 pc=0xeb76ad
github.com/dgraph-io/ristretto/z.(*AllocatorPool).Get(0xc0017f8228, 0x8000000, 0x20e79e7, 0xc, 0x0)
        /go/pkg/mod/github.com/dgraph-io/ristretto@v0.0.4-0.20210504190834-0bf2acd73aa3/z/allocator.go:339 +0xcc fp=0xc001080a00 sp=0xc0010809c8 pc=0xeb950c
github.com/dgraph-io/badger/v3/table.NewTableBuilder(0x100, 0x4000000, 0x0, 0x0, 0x3f847ae147ae147b, 0x1000, 0x0, 0x1, 0xc0004a3880, 0xc0004a3900, ...)
        /go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/table/builder.go:131 +0xb6 fp=0xc001080a70 sp=0xc001080a00 pc=0xfb0196
github.com/dgraph-io/badger/v3.(*StreamWriter).newWriter(0xc0010daed0, 0xc0000000c0, 0xc0000000c0, 0x33e5b00, 0x0)
        /go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/stream_writer.go:303 +0xd8 fp=0xc001080b68 sp=0xc001080a70 pc=0x1003db8
github.com/dgraph-io/badger/v3.(*StreamWriter).Write(0xc0010daed0, 0xc001074000, 0x0, 0x0)
        /go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/stream_writer.go:162 +0x48a fp=0xc001080e40 sp=0xc001080b68 pc=0x1002d8a
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).startWriting.func2(0xc002c94030)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/reduce.go:341 +0x8f fp=0xc001080f30 sp=0xc001080e40 pc=0x1b4cb8f
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).startWriting(0xc0010dbec0, 0xc00018f9e0, 0xc00018faa0, 0xc00031ec60)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/reduce.go:373 +0x145 fp=0xc001080fc0 sp=0xc001080f30 pc=0x1b437e5
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1371 +0x1 fp=0xc001080fc8 sp=0xc001080fc0 pc=0xaaa9a1
created by github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).reduce
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/reduce.go:455 +0x250

goroutine 1 [semacquire]:
sync.runtime_Semacquire(0xc00052608c)
        /usr/local/go/src/runtime/sema.go:56 +0x45
sync.(*WaitGroup).Wait(0xc00052608c)
        /usr/local/go/src/sync/waitgroup.go:130 +0x65
github.com/dgraph-io/badger/v3/y.(*Throttle).Finish.func1()
        /go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/y/y.go:251 +0x39
sync.(*Once).doSlow(0xc000526080, 0xc000202ea8)
        /usr/local/go/src/sync/once.go:68 +0xec
sync.(*Once).Do(...)
        /usr/local/go/src/sync/once.go:59
github.com/dgraph-io/badger/v3/y.(*Throttle).Finish(0xc000526080, 0x2182428, 0xc000526080)
        /go/pkg/mod/github.com/dgraph-io/badger/v3@v3.0.0-20210405181011-d918b9904b2a/y/y.go:250 +0x74
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*reducer).run(0xc0010dbec0, 0xc0010dbec0, 0x0)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/reduce.go:124 +0x2f7
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*loader).reduceStage(0xc0004fc450)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/loader.go:419 +0x8e
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.run()
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/run.go:352 +0x15d3
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.init.0.func1(0xc0002d9680, 0xc000040840, 0x0, 0xc)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/run.go:58 +0x65
github.com/spf13/cobra.(*Command).execute(0xc0002d9680, 0xc000040780, 0xc, 0xc, 0xc0002d9680, 0xc000040780)
        /go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:830 +0x2c2
github.com/spf13/cobra.(*Command).ExecuteC(0x2f7fdc0, 0x0, 0x0, 0x0)
        /go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:914 +0x30b
github.com/spf13/cobra.(*Command).Execute(...)
        /go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:864
github.com/dgraph-io/dgraph/dgraph/cmd.Execute()
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/root.go:78 +0x85
main.main()
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/main.go:99 +0xf0

goroutine 6 [chan receive]:
github.com/golang/glog.(*loggingT).flushDaemon(0x318d480)
        /go/pkg/mod/github.com/golang/glog@v0.0.0-20160126235308-23def4e6c14b/glog.go:882 +0x8b
created by github.com/golang/glog.init.0
        /go/pkg/mod/github.com/golang/glog@v0.0.0-20160126235308-23def4e6c14b/glog.go:410 +0x274

goroutine 34 [select]:
go.opencensus.io/stats/view.(*worker).start(0xc0001be000)
        /go/pkg/mod/go.opencensus.io@v0.22.5/stats/view/worker.go:276 +0xcd
created by go.opencensus.io/stats/view.init.0
        /go/pkg/mod/go.opencensus.io@v0.22.5/stats/view/worker.go:34 +0x68

goroutine 35 [chan receive]:
github.com/dgraph-io/dgraph/x.init.0.func1(0x23ed4c0, 0xc0000420a0)
        /ext-go/1/src/github.com/dgraph-io/dgraph/x/metrics.go:421 +0xe5
created by github.com/dgraph-io/dgraph/x.init.0
        /ext-go/1/src/github.com/dgraph-io/dgraph/x/metrics.go:417 +0x93

goroutine 96 [chan receive]:
main.main.func2(0xc0004ae190, 0x2185e10)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/main.go:57 +0xdc
created by main.main
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/main.go:47 +0xeb

goroutine 97 [IO wait]:
internal/poll.runtime_pollWait(0x7fd249120f18, 0x72, 0x0)
        /usr/local/go/src/runtime/netpoll.go:222 +0x55
internal/poll.(*pollDesc).wait(0xc0004a2218, 0x72, 0x0, 0x0, 0x20e02f8)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45
internal/poll.(*pollDesc).waitRead(...)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Accept(0xc0004a2200, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /usr/local/go/src/internal/poll/fd_unix.go:401 +0x212
net.(*netFD).accept(0xc0004a2200, 0x30, 0x30, 0x7fd2725bf5b8)
        /usr/local/go/src/net/fd_unix.go:172 +0x45
net.(*TCPListener).accept(0xc000490030, 0xc00012fdd8, 0xa455d8, 0x30)
        /usr/local/go/src/net/tcpsock_posix.go:139 +0x32
net.(*TCPListener).Accept(0xc000490030, 0x1fe5800, 0xc00049dad0, 0x1eb0880, 0x2f5b660)
        /usr/local/go/src/net/tcpsock.go:261 +0x65
net/http.(*Server).Serve(0xc0001c81c0, 0x23e65c8, 0xc000490030, 0x0, 0x0)
        /usr/local/go/src/net/http/server.go:2981 +0x285
net/http.(*Server).ListenAndServe(0xc0001c81c0, 0xc0001c81c0, 0x0)
        /usr/local/go/src/net/http/server.go:2910 +0xba
net/http.ListenAndServe(...)
        /usr/local/go/src/net/http/server.go:3164
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.run.func1(0xc0005782c0)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/run.go:269 +0x7d
created by github.com/dgraph-io/dgraph/dgraph/cmd/bulk.run
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/run.go:268 +0xfd1

goroutine 47 [select]:
google.golang.org/grpc.(*ccBalancerWrapper).watcher(0xc00056c580)
        /go/pkg/mod/google.golang.org/grpc@v1.27.0/balancer_conn_wrappers.go:69 +0xac
created by google.golang.org/grpc.newCCBalancerWrapper
        /go/pkg/mod/google.golang.org/grpc@v1.27.0/balancer_conn_wrappers.go:60 +0x172

goroutine 48 [chan receive]:
google.golang.org/grpc.(*addrConn).resetTransport(0xc0002ccb00)
        /go/pkg/mod/google.golang.org/grpc@v1.27.0/clientconn.go:1179 +0x465
created by google.golang.org/grpc.(*addrConn).connect
        /go/pkg/mod/google.golang.org/grpc@v1.27.0/clientconn.go:815 +0x12a

goroutine 132 [IO wait]:
internal/poll.runtime_pollWait(0x7fd249120e30, 0x72, 0xffffffffffffffff)
        /usr/local/go/src/runtime/netpoll.go:222 +0x55
internal/poll.(*pollDesc).wait(0xc0004a2498, 0x72, 0x8000, 0x8000, 0xffffffffffffffff)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45
internal/poll.(*pollDesc).waitRead(...)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Read(0xc0004a2480, 0xc000408000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
        /usr/local/go/src/internal/poll/fd_unix.go:166 +0x1d5
net.(*netFD).Read(0xc0004a2480, 0xc000408000, 0x8000, 0x8000, 0x116611e, 0x800010601, 0x0)
        /usr/local/go/src/net/fd_posix.go:55 +0x4f
net.(*conn).Read(0xc00049a0b0, 0xc000408000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
        /usr/local/go/src/net/net.go:183 +0x91
bufio.(*Reader).Read(0xc0005aad20, 0xc0001c82d8, 0x9, 0x9, 0x14, 0x7fd2725c1fe8, 0x18)
        /usr/local/go/src/bufio/bufio.go:227 +0x222
io.ReadAtLeast(0x23c2c40, 0xc0005aad20, 0xc0001c82d8, 0x9, 0x9, 0x9, 0xc0002e5dd0, 0xa92a13, 0x0)
        /usr/local/go/src/io/io.go:328 +0x87
io.ReadFull(...)
        /usr/local/go/src/io/io.go:347
golang.org/x/net/http2.readFrameHeader(0xc0001c82d8, 0x9, 0x9, 0x23c2c40, 0xc0005aad20, 0x0, 0x0, 0xc04f2677af4f8719, 0x289a8c8b)
        /go/pkg/mod/golang.org/x/net@v0.0.0-20201021035429-f5854403a974/http2/frame.go:237 +0x89
golang.org/x/net/http2.(*Framer).ReadFrame(0xc0001c82a0, 0xc004702180, 0xc004702180, 0x0, 0x0)
        /go/pkg/mod/golang.org/x/net@v0.0.0-20201021035429-f5854403a974/http2/frame.go:492 +0xa5
google.golang.org/grpc/internal/transport.(*http2Client).reader(0xc0003b81c0)
        /go/pkg/mod/google.golang.org/grpc@v1.27.0/internal/transport/http2_client.go:1265 +0x185
created by google.golang.org/grpc/internal/transport.newHTTP2Client
        /go/pkg/mod/google.golang.org/grpc@v1.27.0/internal/transport/http2_client.go:300 +0xd51

goroutine 133 [select]:
google.golang.org/grpc/internal/transport.(*controlBuffer).get(0xc0005a8dc0, 0x1, 0x0, 0x0, 0x0, 0x0)
        /go/pkg/mod/google.golang.org/grpc@v1.27.0/internal/transport/controlbuf.go:395 +0xff
google.golang.org/grpc/internal/transport.(*loopyWriter).run(0xc0005aade0, 0x0, 0x0)
        /go/pkg/mod/google.golang.org/grpc@v1.27.0/internal/transport/controlbuf.go:513 +0x1dd
google.golang.org/grpc/internal/transport.newHTTP2Client.func3(0xc0003b81c0)
        /go/pkg/mod/google.golang.org/grpc@v1.27.0/internal/transport/http2_client.go:346 +0x7b
created by google.golang.org/grpc/internal/transport.newHTTP2Client
        /go/pkg/mod/google.golang.org/grpc@v1.27.0/internal/transport/http2_client.go:344 +0xf27

goroutine 123 [select]:
github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*progress).report(0xc000494c40)
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/progress.go:74 +0xdb
created by github.com/dgraph-io/dgraph/dgraph/cmd/bulk.newLoader
        /ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/bulk/loader.go:150 +0x70d

goroutine 124 [chan send]:
github.com/dgraph-io/dgraph/xidmap.New.func2(0xc0004a2380)
        /ext-go/1/src/github.com/dgraph-io/dgraph/xidmap/xidmap.go:165 +0x73
created by github.com/dgraph-io/dgraph/xidmap.New
        /ext-go/1/src/github.com/dgraph-io/dgraph/xidmap/xidmap.go:152 +0x31f

As i can see, memory is aggressively consumed in REDUCE stage.
I tried to load 1_000_000 records and lot of memory is consumed too but is enough to process data.

My assumption is that smth is wrong with wikidata. I also tried to load data using live loader and successfully load 1_200_000_000 records but faced with another issue when tried to load more. I will create a separate topic about it.

I tried to make a memory profile but didn’t understood how to get it after execution. Where is it saved?

Can someone please help to figure out what’s going on?
Thanks

Hey @keks51 , bulk loader exposes an http port to debug. You can take heap profile with a command that should look like:

go tool pprof localhost:8080/debug/pprof/heap

It will print out the destination:

Fetching profile over HTTP from http://localhost:8080/debug/pprof/heap
Saved profile in /home/dg/pprof/pprof.dgraph.alloc_objects.alloc_space.inuse_objects.inuse_space.003.pb.gz
File: dgraph
Build ID: a05419818fad7662cc49992bb9b39ae4426199ad
Type: inuse_space
Time: Oct 6, 2021 at 5:39am (PDT)
Entering interactive mode (type "help" for commands, "o" for options)

Heap profiles will help us debug the issue.
Also, please collect the jemalloc profile as well.

dg@machine:~$ curl localhost:8080/jemalloc
Num Allocated Bytes: 6.7 GiB [7235175716]
Allocators:
Tag: Reducer.ToList Num: 1 Size: 4.0 GiB . Tag: Builder Num: 1 Size: 384 MiB . 
Allocations:
4.0 GiB at file: Reducer.ToList
260 MiB at file: Reducer.Buffer.ToList
2.1 GiB at file: Reducer.GetBuf
128 MiB at file: TableBuilder
256 MiB at file: Builder

Hey @Naman
Thanks for your help.

jemaloc_curl (19.4 KB)
pprof.dgraph.alloc_objects.alloc_space.inuse_objects.inuse_space.006.pb.gz (11.9 KB)

I have looked at the profile. And as visible from the jemalloc profile, TableBuilder is taking too much memory.
There might be something wrong with data. There is a possibility of a memory bug in dgraph. Will you be able to share the data with me over naman@dgraph.io (upload to some cloud storage/gdrive and sharing the link)?

Hi @Naman.
I tested bulk loading with 1gb data locally on my hp laptop (windows os) and the process successfully finished. Then i disabled memory swap and the process failed. So i decided that the issue is due too swap configuration. I enabled memory swap on cluster but it din’t help, bulk still fails with OOM when memory swap is enabled. Also i sent you an email with data link.

Hi @Naman. Did you have a chance to look over the data?
Thanks