Import rate decays until timeout

yizhe.li · July 23, 2020, 3:00am

hi,
I used sample data to test the import on my personal computer.

Respectively used 3 ways :
live loader --new_uids=false,
ludicrous mode live loader --new_uids=false
ludicrous mode live loader --new_uids=true
The remaining parameters are default values.

In each case, I imported the same data three times repeatedly, and found that the import rate dropped severely. And when it was imported for the fourth time, the imported logs began to show a lot of timeouts.

I want to understand the following 2 questions.

Will the import rate continue to decline? If I want to maintain the dynamic import rate at 50,000 per second, what is the correspondence between the hardware configuration and the amount of data?
After the data is imported, I find that the “id” is gone. Is there a corresponding relationship between the id in the data and the id in the final query? Can I only find it through attributes?

Imported data
benchmarks/data/1million.rdf.gz at master · dgraph-io/benchmarks · GitHub
my computer
System: macOS Catalina 10.15.5
CPU: 2.2GHz 4 Intel Core i7
Memory: 16GB 1600MHz DDR3
Docker compose file command

dgraph alpha --my=alpha:7080 --lru_mb=2048 --zero=zero:5080 --ludicrous_mode

MichelDiz · July 23, 2020, 3:19am

There are some background tasks that Dgraph does to give some guarantees. But there are ways to “hack” it, I have shared some of them here Sharing some numbers from the ludicrous mode. And I think you can get way better results with this machine you have. I got 200k N-Quads/s. Not sure about your numbers. Has your graph 70k N-Quads/s?

yizhe.li · July 23, 2020, 3:27am

Yes, I got a maximum of 70k N-Quads/s.
Which parameter needs to be adjusted?
I repeated it many times to get the same result.

MichelDiz · July 23, 2020, 3:31am

I have shared in the post I mentioned. It is a combination of params.

yizhe.li · July 23, 2020, 4:20am

It seems that the --conc and --batch parameters play a key role.

When using node normal to test live load, I first performed an incremental test on these two parameters, and the conclusion was that the default values were very good, because adjusting them had little effect or even reduced the load rate.

But in the ludicrous mode, they exerted great power , what caused this?

Thanks!

MichelDiz · July 23, 2020, 4:35am

Yes, in the first minutes of the first half, yes they are. But some Dgraph tasks can still slightly decrease the consistency of the results. Even so, you will still have great results with ludicrous mode.

Basically ludicrous mode unleashes the real power of Dgraph. All ACID guarantees are removed from it. There are no transactions and so on. Thus, there is no consumption of resources to guarantee such standards. That is why we see such exorbitant responses. And the more resources you add, the more you can get out of it.

I’m curious, do you have a new graph to share?

Topic		Replies	Views
Sharing some numbers from the ludicrous mode Dgraph performance	3	2139	April 15, 2020
How can you speed up live import and bull import data, and where are their performance bottlenecks Dgraph kind:question , dgraph , area:bulk-loader , area:live-loader	1	560	March 17, 2021
Fatal error: runtime: out of memory when bulk loader Dgraph bulkloader	13	1845	August 10, 2020
DGraph Times Out Processing Graph Dgraph dgraph , investigate , status:accepted , area:performance	26	1181	November 13, 2019
Bulk loading 72.1M records from RDBMS with 0 output Dgraph bulkloader	17	1769	July 22, 2020

Import rate decays until timeout

Related topics