Fatal error: runtime: out of memory when bulk loader

hi
Dgraph Live Loader data import error reporting

Use Swarm to deploy dgraph based on documentation

The import command is as follows

nohup dgraph bulk -f . -s entities.schema --zero=zero1:5080  --reduce_shards=1 --mapoutput_mb=1 --num_go_routines=1 &

Log as follows

[08:51:39Z] MAP 45m24s nquad_count:121.3M err_count:0.000 nquad_speed:44.53k/sec edge_count:1.286G edge_speed:471.9k/sec
[08:51:40Z] MAP 45m25s nquad_count:121.3M err_count:0.000 nquad_speed:44.52k/sec edge_count:1.286G edge_speed:471.9k/sec
[08:51:41Z] MAP 45m26s nquad_count:121.4M err_count:0.000 nquad_speed:44.52k/sec edge_count:1.286G edge_speed:471.8k/sec
[08:51:42Z] MAP 45m27s nquad_count:121.4M err_count:0.000 nquad_speed:44.51k/sec edge_count:1.287G edge_speed:471.8k/sec
[08:51:43Z] MAP 45m28s nquad_count:121.4M err_count:0.000 nquad_speed:44.51k/sec edge_count:1.287G edge_speed:471.8k/sec
[08:51:44Z] MAP 45m29s nquad_count:121.5M err_count:0.000 nquad_speed:44.51k/sec edge_count:1.287G edge_speed:471.7k/sec
[08:51:45Z] MAP 45m30s nquad_count:121.5M err_count:0.000 nquad_speed:44.49k/sec edge_count:1.287G edge_speed:471.5k/sec
[08:51:47Z] MAP 45m31s nquad_count:121.5M err_count:0.000 nquad_speed:44.48k/sec edge_count:1.287G edge_speed:471.3k/sec
[08:51:48Z] MAP 45m32s nquad_count:121.5M err_count:0.000 nquad_speed:44.46k/sec edge_count:1.288G edge_speed:471.2k/sec
[08:51:49Z] MAP 45m33s nquad_count:121.5M err_count:0.000 nquad_speed:44.46k/sec edge_count:1.288G edge_speed:471.1k/sec
[08:51:50Z] MAP 45m34s nquad_count:121.6M err_count:0.000 nquad_speed:44.45k/sec edge_count:1.288G edge_speed:471.1k/sec
[08:51:51Z] MAP 45m35s nquad_count:121.6M err_count:0.000 nquad_speed:44.45k/sec edge_count:1.289G edge_speed:471.1k/sec
[08:51:52Z] MAP 45m36s nquad_count:121.6M err_count:0.000 nquad_speed:44.45k/sec edge_count:1.289G edge_speed:471.0k/sec
[08:51:53Z] MAP 45m37s nquad_count:121.6M err_count:0.000 nquad_speed:44.43k/sec edge_count:1.289G edge_speed:470.9k/sec
Shard tmp/map_output/000 -> Reduce tmp/shards/shard_0/000
[08:51:54Z] REDUCE 45m38s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[08:51:55Z] REDUCE 45m39s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[08:51:56Z] REDUCE 45m40s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[08:51:57Z] REDUCE 45m41s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[08:51:58Z] REDUCE 45m42s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
Num CPUs: 4
[08:51:59Z] REDUCE 45m43s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[08:52:00Z] REDUCE 45m44s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[08:52:01Z] REDUCE 45m46s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[08:52:03Z] REDUCE 45m47s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[08:52:04Z] REDUCE 45m48s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[08:52:05Z] REDUCE 45m49s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[08:52:06Z] REDUCE 45m50s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[08:52:07Z] REDUCE 45m51s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[08:52:08Z] REDUCE 45m52s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[08:52:09Z] REDUCE 45m53s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[08:52:10Z] REDUCE 45m54s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[08:52:11Z] REDUCE 45m55s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[08:52:12Z] REDUCE 45m57s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[08:52:13Z] REDUCE 45m58s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
fatal error: runtime: out of memory

runtime stack:
runtime.throw(0x195a44b, 0x16)
	/usr/local/go/src/runtime/panic.go:1114 +0x72
runtime.sysMap(0xc398000000, 0x4000000, 0x2834d78)
	/usr/local/go/src/runtime/mem_linux.go:169 +0xc5
runtime.(*mheap).sysAlloc(0x281f840, 0x400000, 0x281f848, 0x21)
	/usr/local/go/src/runtime/malloc.go:715 +0x1cd
runtime.(*mheap).grow(0x281f840, 0x21, 0x0)
	/usr/local/go/src/runtime/mheap.go:1286 +0x11c
runtime.(*mheap).allocSpan(0x281f840, 0x21, 0x7fa1c2ff0000, 0x2834d88, 0x100)
	/usr/local/go/src/runtime/mheap.go:1124 +0x6a0
runtime.(*mheap).alloc.func1()
	/usr/local/go/src/runtime/mheap.go:871 +0x64
runtime.(*mheap).alloc(0x281f840, 0x21, 0x9a0100, 0x281f840)
	/usr/local/go/src/runtime/mheap.go:865 +0x81
runtime.largeAlloc(0x40740, 0x1, 0x281f840)
	/usr/local/go/src/runtime/malloc.go:1152 +0x92
runtime.mallocgc.func1()
	/usr/local/go/src/runtime/malloc.go:1047 +0x46
runtime.systemstack(0x0)
	/usr/local/go/src/runtime/asm_amd64.s:370 +0x66

Hardware configuration: 4 C 16G *3

Alpha launch command

dgraph alpha --my=alpha1:7080 --lru_mb=8192 --zero=zero1:5080 --badger.vlog=disk
dgraph alpha --my=alpha2:7081 --lru_mb=8192 --zero=zero1:5080 -o 1 --badger.vlog=disk
dgraph alpha --my=alpha3:7082 --lru_mb=8192 --zero=zero1:5080 -o 2 --badger.vlog=disk

It appears that the Bulk Loader import was successful in the Map phase, but an error was reported in the Reduce phase

I think this was the root of my problem as well with OOM when importing.

  1. How much memory is on the physical machine?
  2. if using docker, what is the memory config?

hi @amaster507

Thank you for your reply
The machine is configured with three 4-core 16G machines
I used Docker Swarm to deploy dgraph, and Docker memory was not specifically restricted

I am still very green at this myself, and this reaches the end of my knowledge so far with dgraph. Do you know what the memory being consumed was during the error?

If you used this https://raw.githubusercontent.com/dgraph-io/dgraph/master/contrib/config/docker/docker-compose.yml then check that the 2048mb limit was not imposed as is by default.

2 Likes

hi @amaster507

This restriction does exist in the alpha launch command

dgraph alpha -- my=alpha1:7080 -- lru_mb =2048 -- zero = zero1:5080

Should I remove this option, as shown below?

dgraph alpha --my = alpha1:7080 --zero = zero1:5080

My PC’s memory is 16GB, I have modified lRU_MB to 8192, but the error still exists.

How big is your data?

hi @Anurag
data is 10 million

format like this

Give it more resources. Bulkloader uses resources exponentially based on the dataset size, especially RAM. In general for small (21mi) bulk loads we set 20GB for safety. And way more for bigger datasets. I think the dev team has some improvements coming to the bulk loader. But not sure when it would be available.

When the bulk is finished, you can return the machine to your default resource config you were using.

This do nothing, LRU is a cache for queries. And it is disabled until we add Ristretto cache to Dgraph.

3 Likes

Here is a blog article that describes why/how the bulk-loader imports your data into dgraph. It is from 2017 and some things may have changed but it will give you an idea of the internals.

1 Like

hi @Paras @MichelDiz

Can I import data under the existing hardware conditions, can accept the import efficiency is slower, but faster than the Live Loader?

I cannot improve the hardware condition for the time being

Thank you

It is something they are discussing…

1 Like

Hi, can you try increasing the number of map and reduce shards. Maybe try 10?

同样的问题, dgraph bulk非常不好用。
map耗内存,map完读map文件reduce耗内存,存xidmap更耗内存。
他们弄一个uid系统,让用户自定义的id处在非常尴尬的位置,也没有merge功能,导入也容易产生多个重复节点。