FYI: Running on RAM

jchiu · November 9, 2016, 12:13pm

I tried running dgraphloader to load names.gz and rdf-films.gz, completely on ramfs. It took 2min 24s. It is roughly a 2X speedup over running on SSD. The RAM limit is set to 4G but at times, it goes beyond and close to 6G.

Then I wrote a very short C++ program to parse RDFs and load into a map / balanced tree. The key is pair<uint64, uint64> for predicate and source UID. The value is either a string for attribute values, or a set<uint64> representing a posting list. The program is single-threaded. There is no parallelization. It also writes out all the data in binary format to SSD. However, it assumes that the inputs are already unzipped. The unzipping takes about 4s on my machine. The program itself took 24s. Let’s just say that overall, it took <30s. The program used up to 2.8% of my 64G RAM, which is about 1.8G RAM.

The C++ little program is by no means a fair comparison with dgraphloader. It doesn’t scale and it cheats by loading everything before writing out once, unlike a LSM tree. The main point of the exercise is to gauge the “theoretical best performance” we can aim for. It doesn’t do any parallelization, so the theoretical best is probably <20s.

mrjn · November 9, 2016, 1:01pm

Now compare against Cayley and Neo4J to see what the users would see.

system · November 28, 2017, 1:00am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
I imported 50 billion rdf into dgraph in 15 hours Dgraph	8	571	June 6, 2020
Loading close to 1M edges/sec into Dgraph - Dgraph Blog Blog	3	1464	November 15, 2018
Live loader started at 8,000 RDF/s and slowly decreases Dgraph	4	810	August 16, 2018
Bulk loader becomes slow when memory gets full Users	20	2169	December 17, 2017
Recommended RAM to bulk load 200M RDF entries to avoid OOM Dgraph	11	759	September 8, 2020

FYI: Running on RAM

Related topics