Bulk Loader performance

apete · January 29, 2021, 2:31pm

I’m trying out the bulk loader.

During MAP it appears to be fast and scales well when increasing num_go_routines.

REDUCE takes much longer than MAP, and during this phase cpu utilisation drops significantly. I’ve tried different values on map_shards, reduce_shards and reducers but don’t see much difference. What’s the key thing to do to make the REDUCE phase faster?

reduce_shards is stated as one of the main things to tweak. This is not purely a performance parameter for the bulk loader as it must match the number of alpha instances used later, and then the options for map_shards and reducers are also limited by what reduce_shards is set to. This limits “everything”.

Loading lots of data… The test machine has 24cores/48threads and about 400G RAM.

Running tests on 5% of the total data set takes about 10h. That’s too slow for us. Since I see the machine is far from fully loaded I hope there’s something to tweak. What?

Dgraph version : v20.11.0
Dgraph codename : tchalla
Dgraph SHA-256 : 8acb886b24556691d7d74929817a4ac7d9db76bb8b77de00f44650931a16b6ac
Commit SHA-1 : c4245ad55
Commit timestamp : 2020-12-16 15:55:40 +0530
Branch : HEAD
Go version : go1.15.5
jemalloc enabled : true

MichelDiz · January 29, 2021, 5:25pm

What is the size of your dataset?

Following this (old) blog post Loading close to 1M edges/sec into Dgraph - Dgraph Blog
We can see that Dgraph can load 3 billion edges per hour. As the blog is too old, maybe it is higher.

Are you using SSD or NVMe? Disks can be a bottle neck.

apete · January 29, 2021, 7:26pm

Full set is 1.6TB rdf.gz data (something like 10TB uncompressed).

Believe we have plain SSD disks.

MichelDiz · January 29, 2021, 8:03pm

So, 5% of your data is 80GB?

BTW, that’s a lot of data. What is your idea of time? there is some other DB to compare?

Hum, something is off. Your data is 80GB, the duration was 10 hours. Weird, cuz in the blog post in comparison it should take 1 hour at least. That means you may have 2.26 MB/sec of write in disk.

apete · January 29, 2021, 10:25pm

The solution we have today is a process that takes 3-4 days (depending on what you include). We’d like to improve on this. The bulk loader would replace the main part of this, but would also have to include the necessary upsert:s (haven’t looked into this part yet).

The disks the tests write to should be fast, but I’ll try to check on this.

apete · February 1, 2021, 11:11am

I measured the SSD on the test machine, for speed, and got that it should be capable of 300MB/s simultaneous read/write from the same disk.

Topic		Replies	Views
Bulk Loader REDUCE problem - it's very slow Dgraph dgraph , status:accepted , kind:bug , ticket:created	22	968	March 12, 2021
Improve throughput of bulk loader with distributed loading Dgraph dgraph , kind:enhancement , priority:p2 , status:accepted , popular	21	1026	February 6, 2020
Loading close to 1M edges/sec into Dgraph - Dgraph Blog Blog	3	1464	November 15, 2018
Bulk loader becomes slow when memory gets full Users	20	2169	December 17, 2017
Bulk loader still OOM during reduce phase Dgraph area:bulk-loader	18	871	August 1, 2021

Bulk Loader performance

Related topics