Apologies, after checking the successful load, I realized that I’ve loaded 1B nquads and 1.6B edges.
I thought it was 200M+ RDF entries, but it was actually the equivalent of 200M+ RDBMS row entries translated into RDF, so it came to about 1B RDF entries I think. Perhaps that was why the RAM was necessary. Everything loaded in 48mins, without ludicrous mode, 16 vCPU 128GB RAM Intel Cascade Lake on Huawei Cloud.
I’m doing a benchmark for management and they’re requesting we do a simple load test on a 2vCPU 4GB RAM server. I load 200 .rdf files (each with 1M records) containing a total of 200M+ entries via bulk loader (–ludicrous_mode seems to make a small difference here) and Dgraph consistently runs out of memory at the 14th .rdf file. It takes about 40-50s per file.
Alpha’s lru_mb is at 2048 to be safe.
Postgres and MSSQL managed to load the 200M+ records with the same CPU and RAM constraints in about 2-3 hours.