Bulk Loader performance

I’m trying out the bulk loader.

During MAP it appears to be fast and scales well when increasing num_go_routines.

REDUCE takes much longer than MAP, and during this phase cpu utilisation drops significantly. I’ve tried different values on map_shards, reduce_shards and reducers but don’t see much difference. What’s the key thing to do to make the REDUCE phase faster?

reduce_shards is stated as one of the main things to tweak. This is not purely a performance parameter for the bulk loader as it must match the number of alpha instances used later, and then the options for map_shards and reducers are also limited by what reduce_shards is set to. This limits “everything”.

Loading lots of data… The test machine has 24cores/48threads and about 400G RAM.

Running tests on 5% of the total data set takes about 10h. That’s too slow for us. Since I see the machine is far from fully loaded I hope there’s something to tweak. What?

Dgraph version : v20.11.0
Dgraph codename : tchalla
Dgraph SHA-256 : 8acb886b24556691d7d74929817a4ac7d9db76bb8b77de00f44650931a16b6ac
Commit SHA-1 : c4245ad55
Commit timestamp : 2020-12-16 15:55:40 +0530
Branch : HEAD
Go version : go1.15.5
jemalloc enabled : true

What is the size of your dataset?

Following this (old) blog post Loading close to 1M edges/sec into Dgraph - Dgraph Blog
We can see that Dgraph can load 3 billion edges per hour. As the blog is too old, maybe it is higher.

Are you using SSD or NVMe? Disks can be a bottle neck.

Full set is 1.6TB rdf.gz data (something like 10TB uncompressed).

Believe we have plain SSD disks.

So, 5% of your data is 80GB?

BTW, that’s a lot of data. What is your idea of time? there is some other DB to compare?

Hum, something is off. Your data is 80GB, the duration was 10 hours. Weird, cuz in the blog post in comparison it should take 1 hour at least. That means you may have 2.26 MB/sec of write in disk.

The solution we have today is a process that takes 3-4 days (depending on what you include). We’d like to improve on this. The bulk loader would replace the main part of this, but would also have to include the necessary upsert:s (haven’t looked into this part yet).

The disks the tests write to should be fast, but I’ll try to check on this.

I measured the SSD on the test machine, for speed, and got that it should be capable of 300MB/s simultaneous read/write from the same disk.