Out of memory problem in large rdf file bulk load

(mousewu) #1

I try to load two large rdf files in the dgraph. The sizes of files are 38GB and 58GB. My server has about 70GB RAM available. I use the following command:
dgraph bulk -f rdf -s bitcoin.schema --map_shards=1 --reduce_shards=1 --http localhost:8000 --zero=localhost:5080 --map_shards 1 --mapoutput_mb 32 --num_go_routines 2
It run out of memory and the process got killed.
I also try to tune parameters such as mapoutput_mb and num_go_routines, but it does not work.
Any suggestion?

(mousewu) #2

The version is:
[Decoder]: Using assembly version of decoder
I0930 15:23:32.276858 5336 init.go:98]

Dgraph version : v1.1.0
Dgraph SHA-256 : 7d4294a80f74692695467e2cf17f74648c18087ed7057d798f40e1d3a31d2095
Commit SHA-1 : ef7cdb28
Commit timestamp : 2019-09-04 00:12:51 -0700
Branch : HEAD
Go version : go1.12.7

For Dgraph official documentation, visit https://docs.dgraph.io.
For discussions about Dgraph , visit https://discuss.dgraph.io.
To say hi to the community , visit https://dgraph.slack.com.

Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2018 Dgraph Labs, Inc.

(Ashish) #3

Hey @mousewu, what is the predicate count in the data?
Also at what stage it runs out of memory(map or reduce)?

(mousewu) #4

It is in reduce phase. The edge count is 1.9G

(mousewu) #5

@ashishgoswami This is last lines of the log.

(Ashish) #6

@mousewu Can you also tell us what is the unique predicate count in the data you are trying to bulk load?

(mousewu) #7

@ashishgoswami it is about 300 million predicates

(mousewu) #8

Actually, I successfully bulk load a 300GB rdf file which includes all bitcoin transactions (a few billion predicates) at that time using an older version of dgraph.
But this time, I add more edge facets to the predicate such as transfer values, txhash and time. Is it the reason for out of memory problem at reduce phase?