Recommended RAM to bulk load 200M RDF entries to avoid OOM

geoyws · May 7, 2020, 8:37am

Edit:
Apologies, after checking the successful load, I realized that I’ve loaded 1B nquads and 1.6B edges.
I thought it was 200M+ RDF entries, but it was actually the equivalent of 200M+ RDBMS row entries translated into RDF, so it came to about 1B RDF entries I think. Perhaps that was why the RAM was necessary. Everything loaded in 48mins, without ludicrous mode, 16 vCPU 128GB RAM Intel Cascade Lake on Huawei Cloud.

Screenshot 2020-05-08 at 11.36.44 AM

Original:
I’m doing a benchmark for management and they’re requesting we do a simple load test on a 2vCPU 4GB RAM server. I load 200 .rdf files (each with 1M records) containing a total of 200M+ entries via bulk loader (–ludicrous_mode seems to make a small difference here) and Dgraph consistently runs out of memory at the 14th .rdf file. It takes about 40-50s per file.
Alpha’s lru_mb is at 2048 to be safe.

Postgres and MSSQL managed to load the 200M+ records with the same CPU and RAM constraints in about 2-3 hours.

geoyws · May 7, 2020, 10:10am

I’ve provisioned a server with 4vCPUs and 16GB RAM. Memory usage seems to be climbing with no end in sight at 9GB/16GB used at only the 33rd .rdf file. Takes about 30-40s per file. Not using ludicrous_mode. If it OOMs again, perhaps I’ll use the Live loader instead.

ashishgoswami · May 7, 2020, 11:16am

Hi @geoyws, please try bulk loader without ludicrous mode as well.

geoyws · May 7, 2020, 1:02pm

@ashishgoswami yup currently running without ludicrous mode

The 4vCPU 16GB RAM VM went OOM after 110 .rdf files (1 million RDF entries each). (Edit: Actually ~5M RDF entries each)

So I’ve gone with a 8 vCPU 32GB RAM VM. If that OOMs, will go with the 64GB RAM VM next.

Update: OOM again.

Going with the 64GB RAM now.

dereksfoster99 · May 7, 2020, 5:33pm

Hi @geoyws Would you like to have a quick call with one of our engineers to try and figure out this issue? Please let me know. We’d be happy to help.

geoyws · May 8, 2020, 3:45am

@dereksfoster99 Sure, when would be a good time?

In short, I repeated the same process with a VM image on a 64GB RAM VM and it still went OOM.

The 128GB RAM VM however managed to do it.

1B nquads and 1.6B edges in 48mins without ludicrous mode, 16 vCPU 128GB RAM Intel Cascade Lake on Huawei Cloud.

Screenshot 2020-05-08 at 11.36.44 AM

dereksfoster99 · May 8, 2020, 4:00am

@geoyws How about Monday afternoon? We’re on Pacific time.

geoyws · May 8, 2020, 4:42am

Appreciate the help. We’re in Kuala Lumpur at UTC +8, so that means at Monday 12pm Pacific Time it would be Monday 3am UTC +8 in KL.
Could we somehow do it either 9am PT or perhaps 3pm PT?

mrjn · September 7, 2020, 1:35am

This commit should fix the issue: perf: Various optimizations to the bulk loader (#6412) · dgraph-io/dgraph@9109186 · GitHub

If you compile from master, do send the build tag “jemalloc”, would require jemalloc to have been installed with je_ prefix. We’ll be updating our Makefile to automatically do this.

amaster507 · September 8, 2020, 3:51am

Will any of these improvements also help reduce OOM in live loader? Or reduce memory consumption of a running machine?

mrjn · September 8, 2020, 3:03pm

Not yet. But, I’ve just asked the team to replicate those live loader OOM issues, so we can fix those. If you have a way to replicate them, do let us know.

amaster507 · September 8, 2020, 5:38pm

Just throw ~6 million quads at a 4Gb RAM dgraph instance. That should make it go OOM. I hqven’t troed lately.

Topic		Replies	Views
Out of memory error Dgraph	15	2390	February 21, 2020
Bulk loader OOM Users	4	924	August 10, 2020
Bulk loader becomes slow when memory gets full Users	20	2215	December 17, 2017
Bulkload OOM when loading big dataset Dgraph dgraph , area:bulk-loader	6	731	July 11, 2020
Fatal error: runtime: out of memory when bulk loader Dgraph bulkloader	13	1802	August 10, 2020

Recommended RAM to bulk load 200M RDF entries to avoid OOM

Related topics