I tried to import over 900 rdf.gz files into DGraph yesterday on one ubuntu node with 4 cores, 32G memory.
MAP process is OK, but REDUCE process failed with outputing “too many open files”.
I checked the tmp directory, and its size is 280G, and there are over 4000 .map files.
So could someone help me with this? How can I import these .rdf.gz files into DGraph?
detailed info is as follows:
REDUCE 04h15m55s [0.00%] edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec
2019/04/25 14:14:22 open tmp/shards/shard_0/000/001308.map: too many open files
Have you tried increasing the max limit on number of open files? By default, linux has limit on number of open files to 1024. You can set that to a higher number like 100,000 or something using the ulimit function.
Generally, memory usage can be reduced by using a smaller value for number of go routines. By default it is set equal to number of CPUs on the machine. You could reduce it further by using the j switch “-j 2” or even “-j 1”. This will take more time to complete, of course.