Import rate decays until timeout

I used sample data to test the import on my personal computer.

Respectively used 3 ways :
live loader --new_uids=false,
ludicrous mode live loader --new_uids=false
ludicrous mode live loader --new_uids=true
The remaining parameters are default values.

In each case, I imported the same data three times repeatedly, and found that the import rate dropped severely. And when it was imported for the fourth time, the imported logs began to show a lot of timeouts.

I want to understand the following 2 questions.

  1. Will the import rate continue to decline? If I want to maintain the dynamic import rate at 50,000 per second, what is the correspondence between the hardware configuration and the amount of data?
  2. After the data is imported, I find that the “id” is gone. Is there a corresponding relationship between the id in the data and the id in the final query? Can I only find it through attributes?

Imported data
benchmarks/1million.rdf.gz at master · dgraph-io/benchmarks · GitHub
my computer
System: macOS Catalina 10.15.5
CPU: 2.2GHz 4 Intel Core i7
Memory: 16GB 1600MHz DDR3
Docker compose file command

dgraph alpha --my=alpha:7080 --lru_mb=2048 --zero=zero:5080 --ludicrous_mode

There are some background tasks that Dgraph does to give some guarantees. But there are ways to “hack” it, I have shared some of them here Sharing some numbers from the ludicrous mode. And I think you can get way better results with this machine you have. I got 200k N-Quads/s. Not sure about your numbers. Has your graph 70k N-Quads/s?

Yes, I got a maximum of 70k N-Quads/s.
Which parameter needs to be adjusted?
I repeated it many times to get the same result.

I have shared in the post I mentioned. It is a combination of params.

It seems that the --conc and --batch parameters play a key role.

When using node normal to test live load, I first performed an incremental test on these two parameters, and the conclusion was that the default values were very good, because adjusting them had little effect or even reduced the load rate.

But in the ludicrous mode, they exerted great power :v:, what caused this?

Thanks! :grimacing:

Yes, in the first minutes of the first half, yes they are. But some Dgraph tasks can still slightly decrease the consistency of the results. Even so, you will still have great results with ludicrous mode.

Basically ludicrous mode unleashes the real power of Dgraph. All ACID guarantees are removed from it. There are no transactions and so on. Thus, there is no consumption of resources to guarantee such standards. That is why we see such exorbitant responses. And the more resources you add, the more you can get out of it.

I’m curious, do you have a new graph to share?