I try to load two large rdf files in the dgraph. The sizes of files are 38GB and 58GB. My server has about 70GB RAM available. I use the following command:
dgraph bulk -f rdf -s bitcoin.schema --map_shards=1 --reduce_shards=1 --http localhost:8000 --zero=localhost:5080 --map_shards 1 --mapoutput_mb 32 --num_go_routines 2
It run out of memory and the process got killed.
I also try to tune parameters such as mapoutput_mb and num_go_routines, but it does not work.
Any suggestion?
The version is:
[Decoder]: Using assembly version of decoder
I0930 15:23:32.276858 5336 init.go:98]
Dgraph version : v1.1.0
Dgraph SHA-256 : 7d4294a80f74692695467e2cf17f74648c18087ed7057d798f40e1d3a31d2095
Commit SHA-1 : ef7cdb28
Commit timestamp : 2019-09-04 00:12:51 -0700
Branch : HEAD
Go version : go1.12.7
For Dgraph official documentation, visit https://docs.dgraph.io.
For discussions about Dgraph , visit http://discuss.dgraph.io.
To say hi to the community , visit https://dgraph.slack.com.
Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2018 Dgraph Labs, Inc.
Hey @mousewu, what is the predicate count in the data?
Also at what stage it runs out of memory(map or reduce)?
@ashishgoswami This is last lines of the log.
@mousewu Can you also tell us what is the unique predicate count in the data you are trying to bulk load?
@ashishgoswami it is about 300 million predicates
Actually, I successfully bulk load a 300GB rdf file which includes all bitcoin transactions (a few billion predicates) at that time using an older version of dgraph.
But this time, I add more edge facets to the predicate such as transfer values, txhash and time. Is it the reason for out of memory problem at reduce phase?
This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.