version: v21.03.2
Compressed data size before ingestion: 375GB
Total out size: 894GB
Zero: 32 core, 256 GB, 2TB SSD
Run: 31 hours for map phase, 11 map reduce phase
Command
dgraph bulk -f /coldstartinput/upload/pending_predicates -s /coldstartinput/upload/rdf_schema/patient.rdf --out /coldstartoutput/out --replace_out --num_go_routines=20 --reducers=7 --format=rdf --store_xids --map_shards=14 --reduce_shards=7 > check.log &
last line of bulk uploader
[04:23:44Z] REDUCE 16h19m42s 100.00% edge_count:81.27G edge_speed:1.383M/sec plist_count:49.18G plist_speed:836.6k/sec. Num Encoding MBs: 0. jemalloc: 0 B
Distribution:
- Shard: 139GB
- Shard: 116GB
- Shard: 158GB
- Shard: 153GB
- Shard: 103GB
- Shard: 104GB
- Shard: 124GB
K9S screen capture
##Questions:
- Can someone help to interpret above mentioned the last line of bulk uploader?
- What can I do make more equal distribution of data among shards
- Once the alpha nodes are up; there was heavy predicates move in alpha nodes (circled in the above k9s screen capture)