Out of memory problem in large rdf file bulk load

mousewu · September 30, 2019, 7:22am

I try to load two large rdf files in the dgraph. The sizes of files are 38GB and 58GB. My server has about 70GB RAM available. I use the following command:
dgraph bulk -f rdf -s bitcoin.schema --map_shards=1 --reduce_shards=1 --http localhost:8000 --zero=localhost:5080 --map_shards 1 --mapoutput_mb 32 --num_go_routines 2
It run out of memory and the process got killed.
I also try to tune parameters such as mapoutput_mb and num_go_routines, but it does not work.
Any suggestion?

mousewu · September 30, 2019, 7:25am

The version is:
[Decoder]: Using assembly version of decoder
I0930 15:23:32.276858 5336 init.go:98]

Dgraph version : v1.1.0
Dgraph SHA-256 : 7d4294a80f74692695467e2cf17f74648c18087ed7057d798f40e1d3a31d2095
Commit SHA-1 : ef7cdb28
Commit timestamp : 2019-09-04 00:12:51 -0700
Branch : HEAD
Go version : go1.12.7

For Dgraph official documentation, visit https://docs.dgraph.io.
For discussions about Dgraph , visit http://discuss.dgraph.io.
To say hi to the community , visit https://dgraph.slack.com.

Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2018 Dgraph Labs, Inc.

ashishgoswami · September 30, 2019, 8:43am

Hey @mousewu, what is the predicate count in the data?
Also at what stage it runs out of memory(map or reduce)?

mousewu · September 30, 2019, 8:46am

It is in reduce phase. The edge count is 1.9G

mousewu · September 30, 2019, 8:49am

@ashishgoswami This is last lines of the log.

ashishgoswami · September 30, 2019, 9:16am

@mousewu Can you also tell us what is the unique predicate count in the data you are trying to bulk load?

mousewu · September 30, 2019, 9:48am

@ashishgoswami it is about 300 million predicates

mousewu · September 30, 2019, 10:43am

Actually, I successfully bulk load a 300GB rdf file which includes all bitcoin transactions (a few billion predicates) at that time using an older version of dgraph.
But this time, I add more edge facets to the predicate such as transfer values, txhash and time. Is it the reason for out of memory problem at reduce phase?

system · October 30, 2019, 10:43am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Load data to dgraph：out of memory Users	7	853	February 7, 2020
Dgraph bulk load out of memory Dgraph bulkloader	8	994	July 10, 2020
Dgraph bulk load with much data Dgraph	7	1610	April 26, 2019
Bulk loader still OOM during reduce phase Dgraph area:bulk-loader	18	871	August 1, 2021
Bulkload OOM when loading big dataset Dgraph dgraph , area:bulk-loader	6	709	July 11, 2020

Out of memory problem in large rdf file bulk load

Related topics