The attached CSV file has population figures for every country, gender and year. It has ca: 20 000 rows and 4 columns, equalling something like 100 000 nquads.
population.csv (702.1 KB)
a) Coming from a Python environment, which method is best to get this CSV data inserted fast?
Here’s a benchmark with a similar dataset for Pandas bulk insert to Postgres:
b) How long would it take (roughly) to insert this without any checks?
c) If each row had a UID (treating the columns as properties and the UID as the node), how much additional time (% penalty) could such a check introduce?
Trying to devise a way to upload such CSV files fast and ideally many of them concurrently with some safety checks … Perhaps someone has experience from this?