Hello,
I have made several attempts trying to import a small fraction of data from RDBMS to dgraph using bulk loader. But whenever it reaches REDUCE phase, everything seems to be empty.
But when I tried with just 2 tiny tables, I can see there is data in the p
folder.
This is the command I attempted:
dgraph bulk -f ./bix/json-gzip/ -s ./bix/schema/bix_schema.txt --replace_out --out ./bulkdgraph --tmp ./tmpdgraph --num_go_routines 4 --mapoutput_mb 16
[05:23:31Z] MAP 04h51m37s nquad_count:788.9M err_count:0.000 nquad_speed:45.09k/sec edge_count:3.748G edge_speed:214.2k/sec
[05:23:32Z] MAP 04h51m38s nquad_count:788.9M err_count:0.000 nquad_speed:45.08k/sec edge_count:3.748G edge_speed:214.2k/sec
Shard tmpdgraph/map_output/000 -> Reduce tmpdgraph/shards/shard_0/000
[05:23:33Z] REDUCE 04h51m39s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[05:23:34Z] REDUCE 04h51m40s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[05:23:35Z] REDUCE 04h51m41s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[05:23:36Z] REDUCE 04h51m42s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[05:23:37Z] REDUCE 04h51m43s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[05:23:38Z] REDUCE 04h51m44s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[05:23:39Z] REDUCE 04h51m45s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[05:23:40Z] REDUCE 04h51m46s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
Num CPUs: 4
[05:23:47Z] REDUCE 04h51m53s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[05:23:51Z] REDUCE 04h51m57s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[05:23:52Z] REDUCE 04h51m59s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[05:23:53Z] REDUCE 04h52m00s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[05:23:54Z] REDUCE 04h52m01s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[05:23:55Z] REDUCE 04h52m02s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
[05:23:56Z] REDUCE 04h52m03s 0.00% edge_count:0.000 edge_speed:0.000/sec plist_count:0.000 plist_speed:0.000/sec. Num Encoding: 0
I ran the above command under 1 AWS t3a.xlarge instance. It took ~6 hours to complete data import for 72.1M records (rows) with <400 columns.