Query throughput

Willem520 · April 20, 2020, 7:46am

Hi,when I bulked data into Dgraph and started cluster I did a simple pressure test.
the query throughput 1300+ qps, the query about biggest latency is 20+ ms.
then, I did some mutation until alpha made rolling up and snapshot.
then I did the same pressure test. but got 400+ ops. the query about biggest latency is 300+ ms.
I wanted to know, there is any difference between the first test and second about Dgraph? thx very much.

Willem520 · April 22, 2020, 2:07am

any body can help me? I am confused about it.

mrjn · April 22, 2020, 2:31am

CC: @ashishgoswami @animesh2049

ashishgoswami · April 22, 2020, 8:13am

Hi @Willem520, thanks for reaching out to us.
Dgraph stores data in posting lists(more here). When you first bulk load the data, all posting lists are stored as complete posting lists. But when you perform mutations, many smaller part of same posting lists(called as delta postings) are formed as a result of these mutations. These parts are merged periodically(via snapshot and rollups) to form complete posting lists(and deleting deltas).
Having delta posting list helps in increasing write throughput but affects read throughput. We have recently enabled incremental rollups to stop rollups of all posting lists at once. This might be one of the reason for performance difference in first and second case.
Other reason I think of is more data to be read/scanned in second case even rollups for all posting list has been completed(as Dgraph uses Badger for storage and data is not deleted immediately from disk, but delete happens via compactions).

Hope this answers your question.

Willem520 · April 22, 2020, 9:12am

Hi, thank for your answer. I do the query like this

{
 q(func:eq(label,"resblock"),first:10){
 uid
 ...
 r_1{
 uid
 ...
}
 r_2{
 uid
 ...
}
}
}

there is total 5 hundred thousand result. and I got 10.in first, I cost 20+ms. in second, I cost 300+ms. the latency is too different.

ashishgoswami · April 23, 2020, 6:39am

Hi @Willem520, what is the Dgraph version you are using? We have recently started incremental rollups(v20.03.0). We think with incremental rollups, difference between first and second cost should have been less.

Willem520 · April 23, 2020, 8:30am

my dgraph version is v1.2.1

ashishgoswami · April 23, 2020, 12:36pm

Hi @Willem520, since we have incremental rollups from v20.03.0, can you please repeat your benchmarks with v20.03.0 and let us know if the costs are still same?

Willem520 · April 24, 2020, 1:54am

Yes,I will try v20.03.0. the new version has incremental rollups? I do not get any info from the version introduction.

Willem520 · April 24, 2020, 4:02am

Hi @ashishgoswami, I have retry my benchmarks by using v20.03.0(3 zero,3 alpha,3replicas).the same query cost 340ms by executing once.

the same pressure test got the following result

it was more lower than above tow case

mrjn · April 30, 2020, 1:31am

Are you doing best-effort queries? Also, if you were to do a Jaeger trace, you’d better understand which part of the query is taking more time.

https://dgraph.io/docs/deploy/#examining-traces-with-jaeger

Willem520 · April 30, 2020, 2:11am

Yes, I used dgraph4j and set best-effort.but I did not use Jaeger trace.

Topic		Replies	Views
Regarding very less throughput in dgraph Badger	2	744	May 29, 2020
Dgraph Latency Tests with Embargo Dev benchmark	0	731	June 1, 2020
How to improve the write throughput of the dgraph cluster? Dgraph discussion	4	603	May 6, 2021
Latest benchmark available? App Development	7	136	August 21, 2024
Dgraph High Write Throughput Dgraph performance , area:performance	0	657	November 12, 2020

Query throughput

Related topics