Dgraph bulk load out of memory

hydra-yl · May 25, 2020, 2:06pm

I am trying to load a 97G json file into dgraph. After starting dgraph bulk, the memory consumption will increase all the way up to 100G then crash with a “run out of memory” error.

Bulk command: dgraph bulk --format json --zero localhost:5082 --http localhost:9900 --reduce_shard 1 --schema s.txt --files d.json.gz

Screenshot1: bulk output IMG-3508 — ImgBB
Screenshot2: pprof heap output IMG-6300 — ImgBB

MichelDiz · May 25, 2020, 4:37pm

Can you share what version you are using?

hydra-yl · May 26, 2020, 3:15am

Dgraph version: v20.03.2
Commit SHA-1: 7553f0dea
Branch: HEAD

mrjn · May 26, 2020, 3:20am

Can you give the actual heap profile (gzipped version), so we can do a bit more analysis?

FWIW, it looks like it is stuck at slurpQuoted, and 65GB of memory is being just for that function. That means that the data has some issue, where a double quote is neither escaped, nor completed with another double quote. In that case, it just keeps on “slurping” text and putting that into a bytes.Buffer until it can find the end – causing your server to OOM.

You should look for an unmatched double quote into your data and fix it up.

Willem520 · May 27, 2020, 3:40am

Hi,I met the same problem. bulk loader could not work well in large file. I had meet the oom too.and it require large memory for large file.can bulk loader support distributed method.

JimWen · May 27, 2020, 3:46am

@mrjn
It seems that this problem exist for a long time because reduce phase load all the map result once into mem. Is this not a bottleneck or i misunderstand something?

xiangzhao632 · May 27, 2020, 4:09am

You are right. I have solve the OOM problem in the v1.1.0，pr：Reduce memory usage of bulkloader by xiangzhao632 · Pull Request #4529 · dgraph-io/dgraph · GitHub. Since v20.03.1 bulkloader has changed a lot , consuming more memory than previous version. Before v20.03.1, reduce phase loaded all mapentries which produce one badger kv in memory, since v20.03.1, it loads even more mapentries.

hydra-yl · June 10, 2020, 1:44pm

Hi, just to report back my OOM issue is indeed due to unescaped double quoute. For example, my data contains malformed string like “123”,“some_user_name”,“some_other_field”

system · July 10, 2020, 1:44pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Bulk loader taking more than 100G virtual memory for 5.4G of data Dgraph mutation	8	1114	March 18, 2019
Bulkload OOM when loading big dataset Dgraph dgraph , area:bulk-loader	6	709	July 11, 2020
Out of memory problem in large rdf file bulk load Users	8	716	October 30, 2019
Inconsistent bulk loader failures Dgraph dgraph , status:accepted , kind:bug , area:bulk-loader	14	882	January 27, 2021
Bulk loader still OOM during reduce phase Dgraph area:bulk-loader	18	871	August 1, 2021

Dgraph bulk load out of memory

Related topics