This post is going to contain profiling and benchmarking results for queries on @custom
fields.
Creating the Dataset
I wrote a script. It is easy that way
I created the dataset using Dgraph’s GraphQL API and some publicly available restaurant data on Kaggle. The data set from Kaggle promised to give me data about ~10k restaurants. Later, I found out some of them were duplicates, and a data file even turned out to be corrupt. Nevertheless, I made following schema to represent the data obtained from Kaggle:
type Country {
cid: ID!
id: String! @id
name: String!
cities: [City]
}
type City {
cid: ID!
id: String! @id
name: String!
country: Country! @hasInverse(field: cities)
restaurants: [RestaurantAddress] @hasInverse(field: city)
}
interface Location {
id: ID!
lat: Float!
long: Float!
address: String!
locality: String!
city: City!
zipcode: Int
}
type RestaurantAddress implements Location {
restaurant: Restaurant! @hasInverse(field: addr)
}
type Restaurant {
id: ID!
xid: String! @id
name: String!
pic: String
addr: RestaurantAddress!
rating: Float
costFor2: Float
currency: String
cuisines: [Cuisine]
dishes: [Dish] @hasInverse(field: servedBy)
createdAt: DateTime!
}
type Cuisine {
id: ID!
name: String! @id
restaurants: [Restaurant] @hasInverse(field: cuisines)
dishes: [Dish] @hasInverse(field: cuisine)
}
type Dish {
id: ID!
name: String!
pic: String
price: Float!
description: String
isVeg: Boolean!
cuisine: Cuisine
servedBy: Restaurant!
}
Then, I ran the script to add Restaurants, configured it to create a random number of dishes for each restaurant, with a maximum limit of 1000 dishes per restaurant. It took almost an hour to generate and add all that data using the GraphQL API (note that it wasn’t all automated, there were some manual steps in between, and the mutations were being executed serially, so, time is not much important here), and once it was done, I ran the following GraphQL± query to find out how many nodes were created:
query {
allNodes(func: has(dgraph.type)) {
count(uid)
}
restaurants(func: eq(dgraph.type, "Restaurant")) {
count(uid)
}
dishes(func: eq(dgraph.type, "Dish")) {
count(uid)
}
cuisines(func: eq(dgraph.type, "Cuisine")) {
count(uid)
}
countries(func: eq(dgraph.type, "Country")) {
count(uid)
}
cities(func: eq(dgraph.type, "City")) {
count(uid)
}
locations(func: eq(dgraph.type, "Location")) {
count(uid)
}
restaurantAddresses(func: eq(dgraph.type, "RestaurantAddress")) {
count(uid)
}
}
It gave me the following results:
"data": {
"allNodes": [
{
"count": 4175750
}
],
"restaurants": [
{
"count": 8422
}
],
"dishes": [
{
"count": 4158700
}
],
"cuisines": [
{
"count": 116
}
],
"countries": [
{
"count": 15
}
],
"cities": [
{
"count": 74
}
],
"locations": [
{
"count": 8422
}
],
"restaurantAddresses": [
{
"count": 8422
}
]
}
So, there we have it! Our database spans 74 cities across 15 countries with 8422 restaurants serving 4158700 dishes varying over 116 cuisines
In total it has got ~4M nodes.
One thing to note here is the order of magnitude of nodes of each type, especially the following:
- Cuisines: ~100
- Restaurants: ~10000
- Dishes: ~1000000
Each one has got two more zeroes than the previous one. We are going to use these types in the benchmarking queries.
Benchmarking
So, when you send a GraphQL query containing custom fields to Dgraph, the GraphQL layer after fetching results from Dgraph, sends HTTP requests to custom URLs for fetching the results for custom fields. That requires an API server that would respond back to those requests with correct answers. So, I decided to start another Dgraph instance serving the same dataset, wrote a simple server in Go, which when receives an HTTP request, would just query the other Dgraph instance and respond with the results. Nice, easy, and correct results we will get for our custom fields, is what I thought. But, let’s see.
First, let’s be clear on some terminology. There are two Dgraph instances here. One which receives GraphQL benchmarking query, and the other which acts as the final API server for some custom APIs. I ran the first one on my host OS, and the second one in a docker container. So, let’s name them as follows:
Host Dgraph
: The one responding to benchmarking queries, running on my host OS.
Docker Dgraph
: The one responding to custom API queries, running in docker.
Custom API Server
: The Go server which provides the custom API. It receives HTTP requests from the Host Dgraph
and performs the correct query on Docker Dgraph
, then responds to Host Dgraph
with the results obtained from Docker Dgraph
.
Both Dgraph instances use the same dataset we created above.
Now, I prepared some schemas and different benchmarking queries for each schema. The only thing that changes in the schemas is where the @custom
is applied. So, that helps us know how @custom
affects the performance of GraphQL layer in different scenarios. But, before knowing how @custom
affects the performance, we must know how the GraphQL layer performs without @custom
, then only we will be able to compare the results. So, the very first schema for which I ran benchmarking queries has no @custom
directive, i.e., it is exactly the schema that I have shown in the previous section. The benchmarking queries I prepared vary over how many levels of data they are querying and how many fields in each level, also from where do they start querying (i.e., is the root query queryCuisine
or queryRestaurant
or queryDish
). All the queries are at max 3 levels deep. I will post a link to all those schemas, queries, and benchmarking results once I have finished the full benchmarking. Again, I automated all this benchmarking with a script. This script runs each query 10 times and collects the timing information reported in GraphQL extensions. It also collects some CPU and memory profiles of the Host Dgraph
. This script should make its way to our public repo in some time.
So, let’s start.
Initial Schema (No custom directive)
Query 1
Consider this query:
query {
queryCuisine {
id
name
}
}
It is simple. Only, ~100 Cuisine nodes are there and there is no custom directive in the initial schema. I got following results for this query:
|=======================================================================================|
| Iteration | Total Time (ns) | Dgraph Time (ns) | GraphQL Time (ns) | GraphQL Time (%) |
|===========|=================|==================|===================|==================|
| 0 | 23198079 | 22812385 | 385694 | 1.662612 |
| 1 | 5175913 | 4542251 | 633662 | 12.242516 |
| 2 | 4999751 | 4316486 | 683265 | 13.665981 |
| 3 | 4168936 | 3643491 | 525445 | 12.603815 |
| 4 | 4169961 | 3454751 | 715210 | 17.151479 |
| 5 | 4065236 | 3453497 | 611739 | 15.048056 |
| 6 | 3802596 | 3249535 | 553061 | 14.544301 |
| 7 | 5009093 | 4410728 | 598365 | 11.945576 |
| 8 | 3217808 | 2748461 | 469347 | 14.585923 |
| 9 | 3943097 | 3389763 | 553334 | 14.032980 |
|===========|=================|==================|===================|==================|
| Avg | 6175047.000000 | 5602134.800000 | 572912.200000 | 9.277860 |
| non-0 Avg | 6175047.000000 | 5602134.800000 | 572912.200000 | 9.277860 |
|=======================================================================================|
As you can see, in the 0th iteration, Dgraph took a lot of time, while for the queries after that, it took comparatively very less time. On average, the GraphQL layer accounted for only ~9% of the total time. I would call the 0th iteration as warming up
. This warming up
resulted in messing up the avg % time in GraphQL layer, which otherwise would have been around 13-14% of the total time. The GraphQL layer functions didn’t show up in any heap or CPU profile for this query.
Query 2
Now consider a bigger query:
query {
queryCuisine {
id
name
restaurants {
name
rating
dishes {
name
price
}
}
}
}
It is complex. It has got 3 levels and ends up querying almost all the 4M nodes.
Lets consider the results for this:
|=======================================================================================|
| Iteration | Total Time (ns) | Dgraph Time (ns) | GraphQL Time (ns) | GraphQL Time (%) |
|===========|=================|==================|===================|==================|
| 0 | 66722094920 | 45208944628 | 21513150292 | 32.242918 |
| 1 | 59862328464 | 38292618956 | 21569709508 | 36.032193 |
| 2 | 59070568339 | 37695178124 | 21375390215 | 36.186194 |
| 3 | 58797093801 | 37368619826 | 21428473975 | 36.444784 |
| 4 | 59213197148 | 37708491437 | 21504705711 | 36.317420 |
| 5 | 57968868767 | 36977505165 | 20991363602 | 36.211443 |
| 6 | 58393392165 | 37155525454 | 21237866711 | 36.370325 |
| 7 | 58154881958 | 36983798346 | 21171083612 | 36.404654 |
| 8 | 58091151702 | 36791804542 | 21299347160 | 36.665390 |
| 9 | 57459322670 | 36324648044 | 21134674626 | 36.781977 |
|===========|=================|==================|===================|==================|
| Avg | 59373289993.400 | 38050713452.1999 | 21322576541.20000 | 35.912742 |
| non-0 Avg | 59373289993.400 | 38050713452.1999 | 21322576541.20000 | 35.912742 |
|=======================================================================================|
Here, you could see that the GraphQL layer takes up about ~36% of the total processing time. That’s huge! Also, the total time it took to execute this query is ~60 secs (a minute huh…!). You are not observing the warming up
here because the 0th iteration of query-1 was the first-ever query which was executed by Host Dgraph
, but since it is running continuously now, the 0th iteration of this query-2 is just another query for Host Dgraph
.
Now, lets look over a memory profile for this query.
Showing nodes accounting for 9.56GB, 98.54% of 9.70GB total
Dropped 164 nodes (cum <= 0.05GB)
Showing top 30 nodes out of 104
flat flat% sum% cum cum%
4.38GB 45.15% 45.15% 4.38GB 45.15% fmt.Sprintf
1.08GB 11.12% 56.27% 2.69GB 27.73% github.com/dgraph-io/dgraph/worker.(*queryState).handleValuePostings.func1
0.64GB 6.64% 62.91% 0.64GB 6.64% github.com/dgraph-io/dgraph/query.(*encoder).appendAttrs
0.59GB 6.04% 68.95% 0.59GB 6.04% github.com/DataDog/zstd.Decompress
0.53GB 5.43% 74.38% 0.53GB 5.43% github.com/dgraph-io/dgraph/worker.convertToType
0.51GB 5.29% 79.67% 0.51GB 5.29% github.com/dgraph-io/dgraph/worker.retrieveValuesAndFacets
0.38GB 3.89% 83.56% 0.38GB 3.89% github.com/dgraph-io/ristretto.newCmRow
0.24GB 2.48% 86.04% 0.24GB 2.48% github.com/dgraph-io/dgraph/query.(*encoder).newNode
0.22GB 2.30% 88.34% 0.22GB 2.30% github.com/dgraph-io/ristretto/z.(*Bloom).Size (inline)
0.22GB 2.27% 90.61% 0.28GB 2.91% github.com/dgraph-io/badger/v2/table.OpenTable
0.21GB 2.12% 92.73% 0.21GB 2.12% github.com/dgraph-io/dgraph/worker.(*queryState).handleValuePostings
0.17GB 1.72% 94.45% 0.17GB 1.72% github.com/dgraph-io/dgraph/query.(*arena).put
0.16GB 1.68% 96.13% 0.16GB 1.68% github.com/dgraph-io/badger/v2/skl.newArena
0.07GB 0.67% 96.80% 0.07GB 0.67% github.com/dgraph-io/dgraph/query.(*SubGraph).appendDummyValues
0.06GB 0.65% 97.45% 0.06GB 0.65% github.com/dgraph-io/badger/v2/table.(*Table).blockCacheKey
0.06GB 0.57% 98.02% 0.06GB 0.57% github.com/dgraph-io/dgraph/protos/pb.(*Posting).Unmarshal
0.04GB 0.39% 98.42% 0.06GB 0.64% github.com/dgraph-io/badger/v2/pb.(*TableIndex).Unmarshal
0.01GB 0.12% 98.54% 0.72GB 7.44% github.com/dgraph-io/badger/v2/table.(*Table).block
...
What I observed that there were no GraphQL functions in the top30. fmt.Sprintf
seems to be the culprit. Almost, all the memory is being consumed by Dgraph as observed from the following diagram:
Now, lets take a look over the CPU profile for this query. The CPU was profiled for about 75 seconds, so should contain all the information for this query.
Here you can see that after
rewriteAndExecute
, most of the time is spent in Dgraph’s query package, while rest is spent in JSON unmarshalling.
So, the GraphQL layer could try to optimize its JSON processing.
Query 3
query {
queryDish {
name
price
cuisine {
id
name
restaurants {
name
rating
}
}
}
}
It is almost the same query as the previous one, but the difference is that it starts querying Dishes at first level, while the previous query had dishes at the last level.
This one errored out saying:
"Dgraph query failed because Dgraph execution failed because encoded response size: 4294967317 is bigger than threshold: 4294967296"
Oops, there is a limit to how much you can query at a time.
Now, lets change the schema to have custom fields.
Schema 1 (@custom
at Restaurant.name
, mode = SINGLE)
So, here the Custom API server
is gonna play a very important role. Its latency decides what will be the speed of the GraphQL layer.
Query 1
query {
queryCuisine {
id
name
restaurants {
name
rating
dishes {
name
price
}
}
}
}
I got following results:
|==========================================================================================================================================================================================================|
| Iteration | Round Trip Time (RTT) | Total Time (TT) | Trace Time (TrT) | Dgraph Time (DT) | GraphQL Time (GT=TT-DT) | % GraphQL Time (GT/TT*100) | Tracing Error (TT-TrT) | Round Trip Overhead (RTT-TT) |
|===========|=======================|=================|==================|==================|=========================|============================|========================|==============================|
| 0 | 66468568072 | 64223691374 | 64223594271 | 38363446944 | 25860244430 | 40.27 | 97103 | 2244876698 |
| 1 | 63527709939 | 61308532534 | 61308460207 | 37892206197 | 23416326337 | 38.19 | 72327 | 2219177405 |
| 2 | 65388871976 | 63149794335 | 63149699958 | 38047145736 | 25102648599 | 39.75 | 94377 | 2239077641 |
| 3 | 65578901210 | 63216787704 | 63216717182 | 37066569187 | 26150218517 | 41.37 | 70522 | 2362113506 |
| 4 | 63977765202 | 61759810694 | 61759747445 | 37922204689 | 23837606005 | 38.60 | 63249 | 2217954508 |
| 5 | 62062608527 | 59772949757 | 59772881870 | 37002180761 | 22770768996 | 38.10 | 67887 | 2289658770 |
| 6 | 62136762295 | 59866791492 | 59866702435 | 37046990795 | 22819800697 | 38.12 | 89057 | 2269970803 |
| 7 | 61800321765 | 59591076655 | 59591017865 | 36443568456 | 23147508199 | 38.84 | 58790 | 2209245110 |
| 8 | 61764738945 | 59519489701 | 59519404080 | 36426050423 | 23093439278 | 38.80 | 85621 | 2245249244 |
| 9 | 62000967050 | 59780311638 | 59780217211 | 36582091037 | 23198220601 | 38.81 | 94427 | 2220655412 |
|===========|=======================|=================|==================|==================|=========================|============================|========================|==============================|
| Avg | 63470721498.10 | 61218923588.40 | 61218844252.40 | 37279245422.50 | 23939678165.90 | 39.11 | 79336.00 | 2251797909.70 |
|==========================================================================================================================================================================================================|
The same query without @custom
was taking approx ~60 secs of time and now it is taking ~61 secs. And on average GraphQL layer has started taking 3% more time as compared to the non-custom version. Not much of a difference.
But, as I said. The Custom API server
plays a very important role. Once I was just logging the request parameters in the Custom API server
, and then more than half of the custom requests were erroring out saying something like:
dial tcp 127.0.0.1:9000: socket: too many open files
and this:
dial tcp: lookup localhost: device or resource busy
Sorry, I was blaming logging for the above errors, but I found out the reason was different. Actually, I had tried custom queries on different machines. When I tried on a machine with ulimit -n
= 1024 I got these errors. But, when I tried the requests on a machine with ulimit -n
= 1000000, there were no errors. So, the limit of how many file descriptors can be open at a time, decides how many concurrent HTTP requests one can make.
Later, We figured, that we were using a new HTTP client for every custom API request, which we should ideally share among all the requests. Another thing that came from logging in the Custom API server
was a bug, I figured that the URL wasn’t being formed properly for all the requests. Actually, all the requests were using the same URL to query the restaurant name for different nodes, while I had specified the restaurant ID in the URL as a query parameter. That meant a bug because the restaurant ID wasn’t changing in the URL for different requests for different restaurants
Schema 2 (@custom
at Restaurant.name
and Restaurant.rating
, mode = SINGLE)
For the same query we did in previous case, I got following results:
|==========================================================================================================================================================================================================|
| Iteration | Round Trip Time (RTT) | Total Time (TT) | Trace Time (TrT) | Dgraph Time (DT) | GraphQL Time (GT=TT-DT) | % GraphQL Time (GT/TT*100) | Tracing Error (TT-TrT) | Round Trip Overhead (RTT-TT) |
|===========|=======================|=================|==================|==================|=========================|============================|========================|==============================|
| 0 | 88340502908 | 86094066348 | 86093973524 | 38419343402 | 47674722946 | 55.38 | 92824 | 2246436560 |
| 1 | 66210222248 | 63739985080 | 63739875323 | 38735287864 | 25004697216 | 39.23 | 109757 | 2470237168 |
| 2 | 67868325501 | 65642375222 | 65642304409 | 37966999610 | 27675375612 | 42.16 | 70813 | 2225950279 |
| 3 | 65222138193 | 62868024268 | 62867926384 | 37427666711 | 25440357557 | 40.47 | 97884 | 2354113925 |
| 4 | 108358242220 | 106126197977 | 106126118297 | 37468178815 | 68658019162 | 64.69 | 79680 | 2232044243 |
| 5 | 110108253427 | 107876266217 | 107876169996 | 37338939419 | 70537326798 | 65.39 | 96221 | 2231987210 |
| 6 | 88563333556 | 86352756552 | 86352660161 | 36875922098 | 49476834454 | 57.30 | 96391 | 2210577004 |
| 7 | 66925765457 | 64596692549 | 64596627767 | 36615986643 | 27980705906 | 43.32 | 64782 | 2329072908 |
| 8 | 65797357895 | 63548888709 | 63548823186 | 36605283969 | 26943604740 | 42.40 | 65523 | 2248469186 |
| 9 | 63025225363 | 60794992442 | 60794929244 | 36301763763 | 24493228679 | 40.29 | 63198 | 2230232921 |
|===========|=======================|=================|==================|==================|=========================|============================|========================|==============================|
| Avg | 79041936676.80 | 76764024536.40 | 76763940829.10 | 37375537229.40 | 39388487307.00 | 51.31 | 83707.30 | 2277912140.40 |
|==========================================================================================================================================================================================================|
All timing information is in nanoseconds, except '% GraphQL Time' which is a percentage.
So, as the number of custom fields increased, the time taken by GraphQL layer also increased.
Schema 3 (@custom at Restaurant.name
, mode = BATCH)
Again, lets just see the results for the same query:
|==========================================================================================================================================================================================================|
| Iteration | Round Trip Time (RTT) | Total Time (TT) | Trace Time (TrT) | Dgraph Time (DT) | GraphQL Time (GT=TT-DT) | % GraphQL Time (GT/TT*100) | Tracing Error (TT-TrT) | Round Trip Overhead (RTT-TT) |
|===========|=======================|=================|==================|==================|=========================|============================|========================|==============================|
| 0 | 61611446863 | 59388103422 | 59388014334 | 37980426748 | 21407676674 | 36.05 | 89088 | 2223343441 |
| 1 | 61400095203 | 59144124108 | 59144044408 | 38121722403 | 21022401705 | 35.54 | 79700 | 2255971095 |
| 2 | 60730566946 | 58481387083 | 58481287646 | 37289418977 | 21191968106 | 36.24 | 99437 | 2249179863 |
| 3 | 60308483130 | 58061209969 | 58061143124 | 36818796463 | 21242413506 | 36.59 | 66845 | 2247273161 |
| 4 | 59732856606 | 57293922185 | 57293775419 | 36557777717 | 20736144468 | 36.19 | 146766 | 2438934421 |
| 5 | 59283153824 | 57083817945 | 57083637255 | 36194470147 | 20889347798 | 36.59 | 180690 | 2199335879 |
| 6 | 59949373049 | 57578450668 | 57578345310 | 36720538461 | 20857912207 | 36.23 | 105358 | 2370922381 |
| 7 | 58881320406 | 56663119564 | 56663014958 | 35874127011 | 20788992553 | 36.69 | 104606 | 2218200842 |
| 8 | 59412697511 | 57159098940 | 57158987791 | 36144347064 | 21014751876 | 36.77 | 111149 | 2253598571 |
| 9 | 59119524162 | 56902921466 | 56902817922 | 35784036763 | 21118884703 | 37.11 | 103544 | 2216602696 |
|===========|=======================|=================|==================|==================|=========================|============================|========================|==============================|
| Avg | 60042951770.00 | 57775615535.00 | 57775506816.70 | 36748566175.40 | 21027049359.60 | 36.39 | 108718.30 | 2267336235.00 |
|==========================================================================================================================================================================================================|
All timing information is in nanoseconds, except '% GraphQL Time' which is a percentage.
Here, the time taken by GraphQL layer is similar to the one observed without @custom
directive, but anyways there is only a improvement of 3% as compared to the SINGLE mode.
Schema 4 (@custom at Restaurant.name
& Restaurant.rating
, mode = BATCH)
|==========================================================================================================================================================================================================|
| Iteration | Round Trip Time (RTT) | Total Time (TT) | Trace Time (TrT) | Dgraph Time (DT) | GraphQL Time (GT=TT-DT) | % GraphQL Time (GT/TT*100) | Tracing Error (TT-TrT) | Round Trip Overhead (RTT-TT) |
|===========|=======================|=================|==================|==================|=========================|============================|========================|==============================|
| 0 | 61590554058 | 59210474563 | 59210381248 | 38093553773 | 21116920790 | 35.66 | 93315 | 2380079495 |
| 1 | 61299109335 | 59035276660 | 59035206298 | 37783346890 | 21251929770 | 36.00 | 70362 | 2263832675 |
| 2 | 60732413641 | 58516064639 | 58515990039 | 37338295939 | 21177768700 | 36.19 | 74600 | 2216349002 |
| 3 | 60374842427 | 58140427294 | 58140338517 | 37068236579 | 21072190715 | 36.24 | 88777 | 2234415133 |
| 4 | 59760191150 | 57254263659 | 57254134245 | 36534692653 | 20719571006 | 36.19 | 129414 | 2505927491 |
| 5 | 59822598677 | 57543582556 | 57543486696 | 36527421680 | 21016160876 | 36.52 | 95860 | 2279016121 |
| 6 | 60045091331 | 57603337964 | 57603235492 | 36697616674 | 20905721290 | 36.29 | 102472 | 2441753367 |
| 7 | 59158758936 | 56946279431 | 56946174635 | 36059726763 | 20886552668 | 36.68 | 104796 | 2212479505 |
| 8 | 59625460560 | 57388379488 | 57388291162 | 36524422036 | 20863957452 | 36.36 | 88326 | 2237081072 |
| 9 | 59467452018 | 57219823067 | 57219758976 | 36228289972 | 20991533095 | 36.69 | 64091 | 2247628951 |
|===========|=======================|=================|==================|==================|=========================|============================|========================|==============================|
| Avg | 60187647213.30 | 57885790932.10 | 57885699730.80 | 36885560295.90 | 21000230636.20 | 36.28 | 91201.30 | 2301856281.20 |
|==========================================================================================================================================================================================================|
Here, the time is still comparable to the non-custom version. And if compared with the custom version, there is a great improvement of ~15%. So, it seems BATCH mode is best for usage with @custom
because it makes very less external HTTP API calls as compared to the SINGLE mode, resulting in better performance.
Overall, I ran benchmarks for a lot of different queries, returning data in different sizes, and this is what I observed:
|=================================================|
| Schema type | % GraphQL layer time |
|=========================|=======================|
| Simple Schema | 14-36% |
| Custom with SINGLE mode | 40-96% |
| Custom with BATCH mode | 36-70% |
|=================================================|
I am gonna keep enriching this post over time with more data, as it keeps coming.
If you are interested in running these benchmarks yourselves, then have a look at this PR.