Fatal error: runtime: out of memory when bulk loader

gumupaier · June 28, 2020, 3:55am

hi
Dgraph Live Loader data import error reporting

Use Swarm to deploy dgraph based on documentation

The import command is as follows

nohup dgraph bulk -f . -s entities.schema --zero=zero1:5080  --reduce_shards=1 --mapoutput_mb=1 --num_go_routines=1 &

Log as follows

[08:51:39Z] MAP 45m24s nquad_count:121.3M err_count:0.000 nquad_speed:44.53k/sec edge_count:1.286G edge_speed:471.9k/sec
[08:51:40Z] MAP 45m25s nquad_count:121.3M err_count:0.000 nquad_speed:44.52k/sec edge_count:1.286G edge_speed:471.9k/sec
[08:51:41Z] MAP 45m26s nquad_count:121.4M err_count:0.000 nquad_speed:44.52k/sec edge_count:1.286G edge_speed:471.8k/sec
[08:51:42Z] MAP 45m27s nquad_count:121.4M err_count:0.000 nquad_speed:44.51k/sec edge_count:1.287G edge_speed:471.8k/sec
[08:51:43Z] MAP 45m28s nquad_count:121.4M err_count:0.000 nquad_speed:44.51k/sec edge_count:1.287G edge_speed:471.8k/sec
[08:51:44Z] MAP 45m29s nquad_count:121.5M err_count:0.000 nquad_speed:44.51k/sec edge_count:1.287G edge_speed:471.7k/sec
[08:51:45Z] MAP 45m30s nquad_count:121.5M err_count:0.000 nquad_speed:44.49k/sec edge_count:1.287G edge_speed:471.5k/sec
[08:51:47Z] MAP 45m31s nquad_count:121.5M err_count:0.000 nquad_speed:44.48k/sec edge_count:1.287G edge_speed:471.3k/sec
[08:51:48Z] MAP 45m32s nquad_count:121.5M err_count:0.000 nquad_speed:44.46k/sec edge_count:1.288G edge_speed:471.2k/sec
[08:51:49Z] MAP 45m33s nquad_count:121.5M err_count:0.000 nquad_speed:44.46k/sec edge_count:1.288G edge_speed:471.1k/sec
[08:51:50Z] MAP 45m34s nquad_count:121.6M err_count:0.000 nquad_speed:44.45k/sec edge_count:1.288G edge_speed:471.1k/sec
[08:51:51Z] MAP 45m35s nquad_count:121.6M err_count:0.000 nquad_speed:44.45k/sec edge_count:1.289G edge_speed:471.1k/sec
[08:51:52Z] MAP 45m36s nquad_count:121.6M err_count:0.000 nquad_speed:44.45k/sec edge_count:1.289G edge_speed:471.0k/sec
[08:51:53Z] MAP 45m37s nquad_count:121.6M err_count:0.000 nquad_speed:44.43k/sec edge_count:1.289G edge_speed:470.9k/sec
Shard tmp/map_output/000 -> Reduce tmp/shards/shard_0/000
[08:51:54Z] REDUCE 45m38s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[08:51:55Z] REDUCE 45m39s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[08:51:56Z] REDUCE 45m40s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[08:51:57Z] REDUCE 45m41s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[08:51:58Z] REDUCE 45m42s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
Num CPUs: 4
[08:51:59Z] REDUCE 45m43s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[08:52:00Z] REDUCE 45m44s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[08:52:01Z] REDUCE 45m46s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[08:52:03Z] REDUCE 45m47s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[08:52:04Z] REDUCE 45m48s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[08:52:05Z] REDUCE 45m49s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[08:52:06Z] REDUCE 45m50s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[08:52:07Z] REDUCE 45m51s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[08:52:08Z] REDUCE 45m52s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[08:52:09Z] REDUCE 45m53s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[08:52:10Z] REDUCE 45m54s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[08:52:11Z] REDUCE 45m55s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[08:52:12Z] REDUCE 45m57s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[08:52:13Z] REDUCE 45m58s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
fatal error: runtime: out of memory

runtime stack:
runtime.throw(0x195a44b, 0x16)
	/usr/local/go/src/runtime/panic.go:1114 +0x72
runtime.sysMap(0xc398000000, 0x4000000, 0x2834d78)
	/usr/local/go/src/runtime/mem_linux.go:169 +0xc5
runtime.(*mheap).sysAlloc(0x281f840, 0x400000, 0x281f848, 0x21)
	/usr/local/go/src/runtime/malloc.go:715 +0x1cd
runtime.(*mheap).grow(0x281f840, 0x21, 0x0)
	/usr/local/go/src/runtime/mheap.go:1286 +0x11c
runtime.(*mheap).allocSpan(0x281f840, 0x21, 0x7fa1c2ff0000, 0x2834d88, 0x100)
	/usr/local/go/src/runtime/mheap.go:1124 +0x6a0
runtime.(*mheap).alloc.func1()
	/usr/local/go/src/runtime/mheap.go:871 +0x64
runtime.(*mheap).alloc(0x281f840, 0x21, 0x9a0100, 0x281f840)
	/usr/local/go/src/runtime/mheap.go:865 +0x81
runtime.largeAlloc(0x40740, 0x1, 0x281f840)
	/usr/local/go/src/runtime/malloc.go:1152 +0x92
runtime.mallocgc.func1()
	/usr/local/go/src/runtime/malloc.go:1047 +0x46
runtime.systemstack(0x0)
	/usr/local/go/src/runtime/asm_amd64.s:370 +0x66

Hardware configuration: 4 C 16G *3

Alpha launch command

dgraph alpha --my=alpha1:7080 --lru_mb=8192 --zero=zero1:5080 --badger.vlog=disk
dgraph alpha --my=alpha2:7081 --lru_mb=8192 --zero=zero1:5080 -o 1 --badger.vlog=disk
dgraph alpha --my=alpha3:7082 --lru_mb=8192 --zero=zero1:5080 -o 2 --badger.vlog=disk

It appears that the Bulk Loader import was successful in the Map phase, but an error was reported in the Reduce phase

amaster507 · June 28, 2020, 4:28am

I think this was the root of my problem as well with OOM when importing.

How much memory is on the physical machine?
if using docker, what is the memory config?

gumupaier · June 28, 2020, 4:31am

hi @amaster507

Thank you for your reply
The machine is configured with three 4-core 16G machines
I used Docker Swarm to deploy dgraph, and Docker memory was not specifically restricted

amaster507 · June 28, 2020, 4:37am

I am still very green at this myself, and this reaches the end of my knowledge so far with dgraph. Do you know what the memory being consumed was during the error?

If you used this https://raw.githubusercontent.com/dgraph-io/dgraph/master/contrib/config/docker/docker-compose.yml then check that the 2048mb limit was not imposed as is by default.

gumupaier · June 28, 2020, 5:35am

hi @amaster507

This restriction does exist in the alpha launch command

dgraph alpha -- my=alpha1:7080 -- lru_mb =2048 -- zero = zero1:5080

Should I remove this option, as shown below?

dgraph alpha --my = alpha1:7080 --zero = zero1:5080

gumupaier · June 28, 2020, 9:06am

My PC’s memory is 16GB, I have modified lRU_MB to 8192, but the error still exists.

Anurag · June 28, 2020, 12:03pm

How big is your data?

gumupaier · June 28, 2020, 12:10pm

hi @Anurag
data is 10 million

format like this

MichelDiz · June 28, 2020, 1:49pm

Give it more resources. Bulkloader uses resources exponentially based on the dataset size, especially RAM. In general for small (21mi) bulk loads we set 20GB for safety. And way more for bigger datasets. I think the dev team has some improvements coming to the bulk loader. But not sure when it would be available.

When the bulk is finished, you can return the machine to your default resource config you were using.

This do nothing, LRU is a cache for queries. And it is disabled until we add Ristretto cache to Dgraph.

Paras · June 28, 2020, 7:37pm

Here is a blog article that describes why/how the bulk-loader imports your data into dgraph. It is from 2017 and some things may have changed but it will give you an idea of the internals.

gumupaier · June 29, 2020, 2:48am

hi @Paras @MichelDiz

Can I import data under the existing hardware conditions, can accept the import efficiency is slower, but faster than the Live Loader?

I cannot improve the hardware condition for the time being

Thank you

amaster507 · June 29, 2020, 2:57am

It is something they are discussing…

harshil_goel · July 10, 2020, 11:34pm

Hi, can you try increasing the number of map and reduce shards. Maybe try 10?

jokk33 · August 10, 2020, 7:59am

同样的问题， dgraph bulk非常不好用。
map耗内存，map完读map文件reduce耗内存，存xidmap更耗内存。
他们弄一个uid系统，让用户自定义的id处在非常尴尬的位置，也没有merge功能，导入也容易产生多个重复节点。

Topic		Replies	Views
Bulkload OOM when loading big dataset Dgraph dgraph , area:bulk-loader	6	668	July 11, 2020
Dgraph bulk load out of memory Dgraph bulkloader	8	905	July 10, 2020
Bulk loader still OOM during reduce phase Dgraph area:bulk-loader	18	770	August 1, 2021
Bulk loading 72.1M records from RDBMS with 0 output Dgraph bulkloader	17	1497	July 22, 2020
Bulk loader taking more than 100G virtual memory for 5.4G of data Dgraph mutation	8	1032	March 18, 2019

Fatal error: runtime: out of memory when bulk loader

Related Topics