Bulk loader still OOM during reduce phase

sbv1976 · July 21, 2021, 8:29pm

What I want to do

Import an existing data set that consists of around 350M RDF quads. There are a lot of uids but I don’t have precise statistics on those.

There are also a large number of unique predicates… I don’t have an exact count, but when I tried to create a schema file that referenced them all (including types), it was about 30GiB uncompressed.

What I did

dgraph bulk --schema bulk-atl/out.schema --files bulk-atl --num_go_routines 1 --mapoutput_mb 1024 --zero dgraph-zero-2.dgraph-zero:5080 --tmp tmp --partition_mb 2

and various alternatives of the above with different sizes for the map files (down to 64MB) and the default --partition_mb value.

I have found earlier reports that have been addressed in the latest builds with the addition of jemalloc, but my build has that and I have failed to import none-the-less.

It would be important to note that I running in a kubernetes cluster monopolizing progressively larger nodes to where my most recent tests have been running on vms with 256GiB RAM; however, this doesn’t seem to matter as the process dies with

runtime: out of memory: cannot allocate 4194304-byte block (83409764352 in use)

When usage is at around 100GiB

I understand that this may be systems management issue, but while I pursue that potential problem, I did want to get some comments on the feasibility and resource requirements of importing a data set like mine.

I have not yet tried to increase the number of map or reduce shards as suggested in other discussions since the help docs state that can only increase memory, but I will try that while waiting on a response for completeness.

Thanks for any help.

Dgraph metadata

dgraph bulk identifying logs


graph version   : v21.03.1
Dgraph codename  : rocket-1
Dgraph SHA-256   : a00b73d583a720aa787171e43b4cb4dbbf75b38e522f66c9943ab2f0263007fe
Commit SHA-1     : ea1cb5f35
Commit timestamp : 2021-06-17 20:38:11 +0530
Branch           : HEAD
Go version       : go1.16.2
jemalloc enabled : true
For Dgraph official documentation, visit https://dgraph.io/docs.

For discussions about Dgraph     , visit http://discuss.dgraph.io.

For fully-managed Dgraph Cloud   , visit https://dgraph.io/cloud.
Licensed variously under the Apache Public License 2.0 and Dgraph Community License.

Copyright 2015-2021 Dgraph Labs, Inc.

___ Begin jemalloc statistics ___ Version: “5.2.1-0-gea6b3e973b477b8061e0076bb257dbd7f3faa756” Build-time option settings config.cache_oblivious: true config.debug: false config.fill: true config.lazy_lock: false config.malloc_conf: “background_thread:true,metadata_thp:auto” config.opt_safety_checks: false config.prof: true config.prof_libgcc: true config.prof_libunwind: false config.stats: true config.utrace: false config.xmalloc: false Run-time option settings opt.abort: false opt.abort_conf: false opt.confirm_conf: false opt.retain: true opt.dss: “secondary” opt.narenas: 128 opt.percpu_arena: “disabled” opt.oversize_threshold: 8388608 opt.metadata_thp: “auto” opt.background_thread: true (background_thread: true) opt.dirty_decay_ms: 10000 (arenas.dirty_decay_ms: 10000) opt.muzzy_decay_ms: 0 (arenas.muzzy_decay_ms: 0) opt.lg_extent_max_active_fit: 6 opt.junk: “false” opt.zero: false opt.tcache: true opt.lg_tcache_max: 15 opt.thp: “default” opt.prof: false opt.prof_prefix: “jeprof” opt.prof_active: true (prof.active: false) opt.prof_thread_active_init: true (prof.thread_active_init: false) opt.lg_prof_sample: 19 (prof.lg_sample: 0) opt.prof_accum: false opt.lg_prof_interval: -1 opt.prof_gdump: false opt.prof_final: false opt.prof_leak: false opt.stats_print: false opt.stats_print_opts: “” Profiling settings prof.thread_active_init: false prof.active: false prof.gdump: false prof.interval: 0 prof.lg_sample: 0 Arenas: 129 Quantum size: 16 Page size: 4096 Maximum thread-cached size class: 32768 Number of bin size classes: 36 Number of thread-cache bin size classes: 41 Number of large size classes: 196 Allocated: 4314440, active: 4362240, metadata: 11125960 (n_thp 0), resident: 15384576, mapped: 23236608, retained: 5599232 Background threads: 4, num_runs: 8, run_interval: 684055750 ns

iluminae · July 21, 2021, 8:58pm

Sorry to hear you are having this trouble - just wanted to give my bulkload experience here:

~3 billion triples
4 input rdf files
- 5-10GiB gzip compressed each
- pulled from minio
4 output groups
16core 32GiB machine the bulk loader was run on, never got above 25GiB memory.
finishes in about 1h20m
v21.03.1
I think default flags other than the output groups

amaster507 · July 21, 2021, 9:15pm

Do you really mean 30Gb just in the schema?? That doesn’t seem right. That would make it seem like every triple is a unique predicate, this just seems REALLY high for the amount of predicates and the size of the schema alone. We work with what I think is a really large schema but it is only 145Kb at 4,298 lines in GraphQL format including the auth rules. I can’t imaging a 30Gb schema file.

sbv1976 · July 21, 2021, 9:33pm

Yes… I mean 30Gb.

For background, I am exporting a dataset from mongo, and for simplicity, rather than chopping up all sub-documents into their own node/dgraph type, we have decided to perform our initial import and analysis by simply flattening the documents into nodes.

i.e. source document like

{
   "this": {"prop": 1, "attribute": 2},
   "is":  {"prop": 1, "attribute": 2},
   "a":  {"prop": 1, "attribute": 2},
   "doc":  {"prop": 1, "attribute": 2},
}

Would be come a node with the predicates:

this.prop
this.attribute
is.prop
is.attribute
a.prop
a.attribute
doc.prop
doc.attribute

If we combine this with the fact that array in documents are also flattened into the node, the number of predicates really explodes (each type has its own set of predicates with only a handful being shared across types)

The reason my schema file doesn’t include all these is because my data structures keeping track of them in my mongo export logic were causing my workstation to start swapping heavily and eventually run out of virtual memory at around 80GiB, so I decided to just punt and add to the schema manually as our evaluation dictates.

Anyway, I now realize that this approach was excessively naive and will definitely need to be reconsidered, but I was hoping the dataset would still be useful for analyzing the perfomance/functionality of graph lookups where they do happen (we do have foreign key style references in the mongo docs and those are what will convert to node references in dgraph)

If I get the feedback dgraph will struggle with disproportionately large numbers of predicates… especially in the bulk load situation, I can work at restructuring the input data to be more usable.

iluminae · July 21, 2021, 11:55pm

Impressive - I read that as 30GB in RDF not schema. For comparison on our production system (with the 3bn triples) we currently have 8422 predicates (and growing), and see no issues in sight on that front.

amaster507 · July 22, 2021, 1:39am

Maybe @MichelDiz has experience with such a large schema? I don’t think there is a hard limit on predicate count or schema size, but just processing a schema that would be over 30Gb is

I still can’t fathom how many predicates that is. Our whole dataset only uses 0.5Gb.

sbv1976 · July 22, 2021, 1:56am

To be clear, and in case it matters, the schema definition I am actually using is not that large.
The one I am using omits most predicates and just includes a small set (10) assigned to about 57 different types and is only 6.9k (562 lines).

However, those predicates omitted from the schema are still being used on nodes/rdf specs.

When I was experimenting manually with a small data set, I was able to populate nodes like this, just the data wasn’t indexed, and I had to explicitly indicate the predicates I wanted to see in queries ( expand(_all_) doesn’t work… which is fine for me)

The reason I mentioned the size of the schema that tried to include everything was just to give a general idea of the number of different predicates.

Also I might mention that there may have been a defect in my export code that wrote predicates (and the types that they were referenced by) twice, but even so, that leaves me with a schema of about 15GB.

amaster507 · July 22, 2021, 2:00am

Right, understood that part. But a schema file is still built containing every predicate with either default or first found type. I assume this is part of the bulk loader process.

sbv1976 · July 22, 2021, 2:07am

OK. Thanks.

MichelDiz · July 22, 2021, 2:12pm

Nope, I don’t and I’m impressed as you guys. I can’t imagine such a big schema lol.

So, you say here that you have 256GiB RAM available for that instance that is running the bulk-load? So it uses less than 50% do the available RAM? and runs OOM.

Can you share via DM the data if reproducible? I may send this to an engineer.

sbv1976 · July 23, 2021, 3:41pm

Pretty much… the machine/instance/VM has 256GiB ram, though my k8s configs only grant 210GiB ram to the container.

Still, the process only gets up to around 100GiB of usage before it reports out of memory and dumps a go stack trace.

When k8s kill the process due to it stepping out of bounds, it more or less just goes away.

I provided sample data via DM… and I’ll mention that I am working on trying to make the number of predicates more efficient by leveraging facets over mangling predicate names.

dmai · July 24, 2021, 4:44am

Thanks for sharing the dataset over DM. Your dataset has about 109 million unique predicate names in the schema. That’s way too much and explains the OOM that’s happening here.

If you can store the “unique” part of the edge as a value (e.g., 0x1> <id> "abc-123-567" .) instead of part of the predicate name then that’d greatly reduce the schema entries here.

Here’s a heap profile of the bulk loader with your data set during the map phase:

File: dgraph
Build ID: 6fee491f94a8999d0e85fe9ea8ccb3321f7ce0d0
Type: inuse_space
Time: Jul 23, 2021 at 2:07pm (PDT)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 44.24GB, 99.88% of 44.30GB total
Dropped 73 nodes (cum <= 0.22GB)
Showing top 10 nodes out of 12
      flat  flat%   sum%        cum   cum%
      19GB 42.88% 42.88%       19GB 42.88%  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*schemaStore).validateType
   14.15GB 31.95% 74.83%    14.15GB 31.95%  github.com/dgraph-io/dgraph/x.NamespaceAttr
   10.83GB 24.45% 99.28%    10.83GB 24.45%  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*shardMap).shardFor
    0.27GB   0.6% 99.88%     0.27GB   0.6%  bytes.makeSlice
         0     0% 99.88%     0.27GB   0.6%  bytes.(*Buffer).Write
         0     0% 99.88%     0.27GB   0.6%  bytes.(*Buffer).grow
         0     0% 99.88%     0.27GB   0.6%  github.com/dgraph-io/dgraph/chunker.(*rdfChunker).Chunk
         0     0% 99.88%    43.99GB 99.31%  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*loader).mapStage.func1
         0     0% 99.88%     0.27GB   0.6%  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*loader).mapStage.func2
         0     0% 99.88%       19GB 42.89%  github.com/dgraph-io/dgraph/dgraph/cmd/bulk.(*mapper).createPostings

Most of the memory usage is due to the large number of predicates in the schema.

sbv1976 · July 26, 2021, 5:09pm

Thank you so much for the time you have taken to look in to this @dmai .

I was aware that the number of predicates used was really poorly planned, and I have been looking in to addressing that, but aside from using way too much memory, I was wondering why the program was quitting due to memory reasons when the system had plenty of memory available.

The heap profile shown looks to be sampled during the map phase, but I was able to get past that just by increasing resources, but once the bulk loader progressed to the reduce phase it would panic before available memory was actually consumed (at around 100GiB of usage when 210GiB were available).

My import data has been updated to track variable/high cardinality components as facets, and this was very effective in getting memory usage down during the map phase. Usage hovered at around 1GiB for the entirety of that process.

However, when transitioning to the reduce phase, memory consumption increased fairly quickly and plateaued at around 35GiB… granted it got much further than before (89% vs 45%), but it eventually also panicked with out of memory messages even though the process has 180GiB more ram available.

Could this be due to using facets instead of restructuring the data more extensively with additional nodes and properties? I would like to be able to compare both approaches in other areas of usage.

Interestingly some of the final bits of output read as

Finishing stream id: 70876
Finishing stream id: 70877
Finishing stream id: 70878
[16:47:02Z] REDUCE 01h29m57s 88.42% edge_count:328.5M edge_speed:271.7k/sec plist_count:175.1M plist_speed:144.8k/sec. Num Encoding MBs: 0. jemalloc: 2.6 TiB
Finishing stream id: 70879

Which I find interesting since I don’t know that system could even provide virtual memory in that amount.

If this is interesting to you, I captured some heap profiles from pprof which I can pass along, and I can also provide updated data

Thanks again,
Steven

dmai · July 26, 2021, 9:05pm

@sbv1976 If you can share the updated dataset? It sounds like there’s still a lot of predicates.

Or you can share the heap profile and jemalloc debug info:

heap profile from /debug/pprof/heap
jemalloc info from /debug/jemalloc

sbv1976 · July 27, 2021, 12:25am

Thanks for the response…

I did a more careful analysis of the data set, and I am down to about 202k predicates right now. 134k of those are more high cardinality use cases with unique ids in the predicate.

Let me try to address that before I take any more of your time.

Steven

dmai · July 27, 2021, 1:37am

Yup, you’ll want to reduce the number of predicates in your Dgraph setup down from 202k.

amaster507 · July 27, 2021, 1:52am

What do you think is an acceptable soft limit on predicate unique count? 100K, 50K, 10K, 1K, 500??

dmai · July 27, 2021, 3:51am

Around 100K would be a lot but also OK.

If you have a lot of the same kind of relationship then to take advantage of Dgraph you don’t want to encode unique information in the predicate name itself. As an example, don’t do this:

<0x1> <To.ID.123> <0x2> .
<0x3> <To.ID.456> <0x4> .
<0x5> <To.ID.789> <0x6> .

It’s better to store this unique ID information (ID.123, ID.456, ID.789) as a value of the edge instead of in the edge name itself:

<0x1> <ID> "123" .
<0x1> <To> <0x2> .
<0x3> <ID> "456" .
<0x3> <To> <0x4> .
<0x5> <ID> "789" .
<0x5> <To> <0x6> .

Instead of thousands of <For.ID.XYZ> predicates there are two: <ID> and <To>.

sbv1976 · August 1, 2021, 4:37pm

Yeah, that structure was a result of trying to import a document database (mongo) with one (flattened) document per node.

I managed to get the predicate count down to around 60k but it seems there are quite a few cases where we were using dictionaries so there are a few cases of this left… since it’s getting harder and harder to identify those cases reliably and since this is still an evaluation, I left those as-is.

Unfortunately, I am now running in to new sorts of issues caused by the choice of data transformation.

I won’t detail them as I have pretty much decided to abandon the flattened-document approach and am going to start working on a more node-intensive solution.

However, I was wondering if there was any literature on how to best organize data to maximize the performance of queries that contain many conditions.

Topic		Replies	Views
Bulk loader taking more than 100G virtual memory for 5.4G of data Dgraph mutation	8	1114	March 18, 2019
Out of memory problem in large rdf file bulk load Users	8	715	October 30, 2019
Bulk Loader - Deploy Documentation	0	892	December 16, 2020
Bulk loading 72.1M records from RDBMS with 0 output Dgraph bulkloader	17	1612	July 22, 2020
Bulk Loader REDUCE problem - it's very slow Dgraph dgraph , status:accepted , kind:bug , ticket:created	22	968	March 12, 2021

Bulk loader still OOM during reduce phase

What I want to do

What I did

Dgraph metadata

Related topics