I am running Dgraph alpha on a
c5ad.8xlarge AWS instance with 16 CPUs / 32 vCPUs, 64 GB RAM and 20 GBit network. Then I am hammering it with concurrent queries to see how many concurrent clients that node can handle before saturating. I would have expected CPU to hit 100% when I increase the number of concurrent clients, but I have found that I can only get it to use 22 CPUs.
The same happens for a
c5ad.16xlarge instance with 32 CPUs / 64 vCPUs and 128 GB RAM. Only 22 of these CPUs are ever used.
Question: Can you explain what prevents Dgraph alpha from using all available CPU cores?
The network is not the bottleneck as it has one order of magnitude more capacity than what is transferred over the wire. Response gRPC messages are in the MBs. I have perf tested the network link and it could transfer 20 GBit per second.
The local SSD that stores the dataset is also not the bottleneck as the entire dataset fits into the OS file cache. I have measured no disk IO.
Dgraph version : v21.03.0
Dgraph codename : rocket
Dgraph SHA-256 : b4e4c77011e2938e9da197395dbce91d0c6ebb83d383b190f5b70201836a773f
Commit SHA-1 : a77bbe8ae
Commit timestamp : 2021-04-07 21:36:38 +0530
Branch : HEAD
Go version : go1.16.2
jemalloc enabled : true
Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2021 Dgraph Labs, Inc.