Much has been said about how v21.12 is 30x faster than v21.03 but check out these histograms, the bottom is the average query latency histogram from this day last week on v21.03.2, and the top is current with v21.12. Both are default settings when it comes to performance, both have exactly the same hardware (15x 16c 64GiB in 5 groups)
The workload has not changed, just the version of the software. I may disable the new posting list cache to see if that is the difference but I am terrified to restart the new cluster’s nodes because they have a chance of ruining the cluster.
Is there any other setting that we could change to regain the performance of v21.03? I am going to have to revert to v21.03 with a patch to badger to fix that corruption issue if we cant get this working like it was before.
Seems like query performance hits the trash while ingestion is on, but when the entire cluster is read-only the performance is great.
Seems like something is suffering from contention hard in this new version. I have tried disabling the posting list cache with the
cache.percentage: 0,50,50 (giving the whole cache to badger) but seems like that was not it.
How’s the CPU usage? Maybe post cpu profile.
Also. While ingestion , have you tried best effort queries?
All user queries are best effort, obviously queries for mutations are not. Nothing about any of that has changed.
CPU is about 10c of 16 possible on all of the 15 nodes, about the same as last week. I can grab a debug profile.
edit: @mrjn DM’d you with a link to the debuginfo.
I could not wait any longer, had to roll back to v21.03.2+badger manifest corruption patch. The query latency measured on the v21.03.2 is back to being fast during ingestion: