GraphQL query/mutation benchmarking result

arijit · July 17, 2020, 2:41am

Auth schema can be found here.

Uid Count
Restaurant: 8422
Dishes: 4158700
Cuisine: 116

Single Level Query

	query {
	  queryRestaurant (first: 10000) {
	    id 
	    name
	  }
	}

Nested Query

	query {
	  queryCuisine (first: 100) {
		id
		name
		restaurants (first: 100) {
		  id 
		  name
		  dishes (first: 100) {
			id
			name
		  }
		}
	  }
	}

CPU profile

Top result by CPU usage. GraphQL functions don’t appear in the top 93%

Most of the CPU time is spent after dgraph.Execute()

Heap profile

GraphQL functions don’t appear in the top 98%

fmt.Sprintf seems like a suspect but it’s called in Dgraph end.

Extension
Auth query: We can see that most of the time (1045026189 ns) of the total time (1046055551 ns)
is spent in the Dgraph end.

"extensions": {
   "touched_uids": 8410962,
   "tracing": {
     "version": 1,
     "startTime": "2020-07-16T23:45:27.798693638+05:30",
     "endTime": "2020-07-16T23:45:28.844749169+05:30",
     "duration": 1046055551,
     "execution": {
       "resolvers": [
         {
           "path": [
             "queryCuisine"
           ],
           "parentType": "Query",
           "fieldName": "queryCuisine",
           "returnType": "[Cuisine]",
           "startOffset": 144549,
           "duration": 1045026189,
           "dgraph": [
             {
               "label": "query",
               "startOffset": 262828,
               "duration": 1044381745
             }
           ]
         }
       ]
     }
   }
 }

Non-Auth query: For non-auth queries ~4s is spent apart from fetching results from Dgraph.
This needs further investigation.

"extensions": {
 "touched_uids": 458610,
 "tracing": {
   "version": 1,
   "startTime": "2020-07-16T23:46:48.73641261+05:30",
   "endTime": "2020-07-16T23:46:50.281062742+05:30",
   "duration": 1544650302,
   "execution": {
     "resolvers": [{
       "path": [
         "queryCuisine"
       ],
       "parentType": "Query",
       "fieldName": "queryCuisine",
       "returnType": "[Cuisine]",
       "startOffset": 154997,
       "duration": 1118614851,
       "dgraph": [{
         "label": "query",
         "startOffset": 256126,
         "duration": 823062710
       }]
     }]
   }
 }
}

Benchmark
Benchmarking code can be found here.
Auth and Non-Auth query have similar performance for single query.

go test -bench=. -benchtime=60s
goos: linux
goarch: amd64
pkg: github.com/dgraph-io/dgraph/graphql/e2e/auth/bench
Auth
	BenchmarkNestedQuery-8                88         815315761 ns/op
	BenchmarkOneLevelQuery-8            4357          15626384 ns/op
Non-Auth
	BenchmarkNestedQuery-8                33        2218877846 ns/op
	BenchmarkOneLevelQuery-8            4446          16100509 ns/op

pawan · July 17, 2020, 7:44am

How come Auth nested queries are taking lesser time compared to non-auth. Is it because they request lesser data? One of our users posted queries where the Auth version was taking 1 sec whereas Auth with Admin was taking 20ms. Can we also look at why is that the case?

arijit · July 17, 2020, 10:23am

Yes, The data for non-auth is a lot more hence it’s taking more time.

I think that might be due to the difference in JWT variable but I will take a deeper look at it.

amaster507 · July 17, 2020, 4:21pm

I think @michaelcompton explained it pretty good over here which helped me understand it better:

My example of the admin taking 20s whereas the user was taking 1s mainly deals with the above.

My admin token quantifies the results based solely on a value in the JWT without doing any query logic so we can evaluate the auth, then we stop there and never go to the DB. However, the user token requires a database request to find out if the user has access based upon data in the database so we then run any remaining database rules in parallel.

That about sums up the difference for me.

abhimanyusinghgaur · July 23, 2020, 9:28am

This post is going to contain profiling and benchmarking results for queries on @custom fields.

Creating the Dataset

I wrote a script. It is easy that way

I created the dataset using Dgraph’s GraphQL API and some publicly available restaurant data on Kaggle. The data set from Kaggle promised to give me data about ~10k restaurants. Later, I found out some of them were duplicates, and a data file even turned out to be corrupt. Nevertheless, I made following schema to represent the data obtained from Kaggle:

type Country {
	cid: ID!
	id: String! @id
	name: String!
	cities: [City]
}

type City {
	cid: ID!
	id: String! @id
	name: String!
	country: Country! @hasInverse(field: cities)
	restaurants: [RestaurantAddress] @hasInverse(field: city)
}

interface Location {
	id: ID!
	lat: Float!
	long: Float!
	address: String!
	locality: String!
	city: City!
	zipcode: Int
}

type RestaurantAddress implements Location {
	restaurant: Restaurant! @hasInverse(field: addr)
}

type Restaurant {
	id: ID!
	xid: String! @id
	name: String!
	pic: String
	addr: RestaurantAddress!
	rating: Float
	costFor2: Float
	currency: String
	cuisines: [Cuisine]
	dishes: [Dish] @hasInverse(field: servedBy)
	createdAt: DateTime!
}

type Cuisine {
	id: ID!
	name: String! @id
	restaurants: [Restaurant] @hasInverse(field: cuisines)
	dishes: [Dish] @hasInverse(field: cuisine)
}

type Dish {
	id: ID!
	name: String!
	pic: String
	price: Float!
	description: String
	isVeg: Boolean!
	cuisine: Cuisine
	servedBy: Restaurant!
}

Then, I ran the script to add Restaurants, configured it to create a random number of dishes for each restaurant, with a maximum limit of 1000 dishes per restaurant. It took almost an hour to generate and add all that data using the GraphQL API (note that it wasn’t all automated, there were some manual steps in between, and the mutations were being executed serially, so, time is not much important here), and once it was done, I ran the following GraphQL± query to find out how many nodes were created:

query {
	allNodes(func: has(dgraph.type)) {
		count(uid)
	}
	restaurants(func: eq(dgraph.type, "Restaurant")) {
		count(uid)
	}
	dishes(func: eq(dgraph.type, "Dish")) {
                count(uid)
        }
	cuisines(func: eq(dgraph.type, "Cuisine")) {
                count(uid)
        }
	countries(func: eq(dgraph.type, "Country")) {
                count(uid)
        }
	cities(func: eq(dgraph.type, "City")) {
                count(uid)
        }
	locations(func: eq(dgraph.type, "Location")) {
                count(uid)
        }
	restaurantAddresses(func: eq(dgraph.type, "RestaurantAddress")) {
                count(uid)
        }
}

It gave me the following results:

"data": {
    "allNodes": [
      {
        "count": 4175750
      }
    ],
    "restaurants": [
      {
        "count": 8422
      }
    ],
    "dishes": [
      {
        "count": 4158700
      }
    ],
    "cuisines": [
      {
        "count": 116
      }
    ],
    "countries": [
      {
        "count": 15
      }
    ],
    "cities": [
      {
        "count": 74
      }
    ],
    "locations": [
      {
        "count": 8422
      }
    ],
    "restaurantAddresses": [
      {
        "count": 8422
      }
    ]
  }

So, there we have it! Our database spans 74 cities across 15 countries with 8422 restaurants serving 4158700 dishes varying over 116 cuisines
In total it has got ~4M nodes.
One thing to note here is the order of magnitude of nodes of each type, especially the following:

Cuisines: ~100
Restaurants: ~10000
Dishes: ~1000000

Each one has got two more zeroes than the previous one. We are going to use these types in the benchmarking queries.

Benchmarking

So, when you send a GraphQL query containing custom fields to Dgraph, the GraphQL layer after fetching results from Dgraph, sends HTTP requests to custom URLs for fetching the results for custom fields. That requires an API server that would respond back to those requests with correct answers. So, I decided to start another Dgraph instance serving the same dataset, wrote a simple server in Go, which when receives an HTTP request, would just query the other Dgraph instance and respond with the results. Nice, easy, and correct results we will get for our custom fields, is what I thought. But, let’s see.

First, let’s be clear on some terminology. There are two Dgraph instances here. One which receives GraphQL benchmarking query, and the other which acts as the final API server for some custom APIs. I ran the first one on my host OS, and the second one in a docker container. So, let’s name them as follows:

Host Dgraph: The one responding to benchmarking queries, running on my host OS.
Docker Dgraph: The one responding to custom API queries, running in docker.
Custom API Server: The Go server which provides the custom API. It receives HTTP requests from the Host Dgraph and performs the correct query on Docker Dgraph, then responds to Host Dgraph with the results obtained from Docker Dgraph.

Both Dgraph instances use the same dataset we created above.

Now, I prepared some schemas and different benchmarking queries for each schema. The only thing that changes in the schemas is where the @custom is applied. So, that helps us know how @custom affects the performance of GraphQL layer in different scenarios. But, before knowing how @custom affects the performance, we must know how the GraphQL layer performs without @custom, then only we will be able to compare the results. So, the very first schema for which I ran benchmarking queries has no @custom directive, i.e., it is exactly the schema that I have shown in the previous section. The benchmarking queries I prepared vary over how many levels of data they are querying and how many fields in each level, also from where do they start querying (i.e., is the root query queryCuisine or queryRestaurant or queryDish). All the queries are at max 3 levels deep. I will post a link to all those schemas, queries, and benchmarking results once I have finished the full benchmarking. Again, I automated all this benchmarking with a script. This script runs each query 10 times and collects the timing information reported in GraphQL extensions. It also collects some CPU and memory profiles of the Host Dgraph. This script should make its way to our public repo in some time.

So, let’s start.

Initial Schema (No custom directive)

Query 1

Consider this query:

query {
	queryCuisine {
		id
		name
	}
}

It is simple. Only, ~100 Cuisine nodes are there and there is no custom directive in the initial schema. I got following results for this query:

|=======================================================================================|
| Iteration | Total Time (ns) | Dgraph Time (ns) | GraphQL Time (ns) | GraphQL Time (%) |
|===========|=================|==================|===================|==================|
|         0 |        23198079 |         22812385 |            385694 |         1.662612 |
|         1 |         5175913 |          4542251 |            633662 |        12.242516 |
|         2 |         4999751 |          4316486 |            683265 |        13.665981 |
|         3 |         4168936 |          3643491 |            525445 |        12.603815 |
|         4 |         4169961 |          3454751 |            715210 |        17.151479 |
|         5 |         4065236 |          3453497 |            611739 |        15.048056 |
|         6 |         3802596 |          3249535 |            553061 |        14.544301 |
|         7 |         5009093 |          4410728 |            598365 |        11.945576 |
|         8 |         3217808 |          2748461 |            469347 |        14.585923 |
|         9 |         3943097 |          3389763 |            553334 |        14.032980 |
|===========|=================|==================|===================|==================|
|    Avg    |  6175047.000000 |   5602134.800000 |     572912.200000 |         9.277860 |
| non-0 Avg |  6175047.000000 |   5602134.800000 |     572912.200000 |         9.277860 |
|=======================================================================================|

As you can see, in the 0th iteration, Dgraph took a lot of time, while for the queries after that, it took comparatively very less time. On average, the GraphQL layer accounted for only ~9% of the total time. I would call the 0th iteration as warming up. This warming up resulted in messing up the avg % time in GraphQL layer, which otherwise would have been around 13-14% of the total time. The GraphQL layer functions didn’t show up in any heap or CPU profile for this query.

Query 2

Now consider a bigger query:

query {
	queryCuisine {
		id
		name
		restaurants {
			name
			rating
			dishes {
				name
				price
			}
		}
	}
}

It is complex. It has got 3 levels and ends up querying almost all the 4M nodes.
Lets consider the results for this:

|=======================================================================================|
| Iteration | Total Time (ns) | Dgraph Time (ns) | GraphQL Time (ns) | GraphQL Time (%) |
|===========|=================|==================|===================|==================|
|         0 |     66722094920 |      45208944628 |       21513150292 |        32.242918 |
|         1 |     59862328464 |      38292618956 |       21569709508 |        36.032193 |
|         2 |     59070568339 |      37695178124 |       21375390215 |        36.186194 |
|         3 |     58797093801 |      37368619826 |       21428473975 |        36.444784 |
|         4 |     59213197148 |      37708491437 |       21504705711 |        36.317420 |
|         5 |     57968868767 |      36977505165 |       20991363602 |        36.211443 |
|         6 |     58393392165 |      37155525454 |       21237866711 |        36.370325 |
|         7 |     58154881958 |      36983798346 |       21171083612 |        36.404654 |
|         8 |     58091151702 |      36791804542 |       21299347160 |        36.665390 |
|         9 |     57459322670 |      36324648044 |       21134674626 |        36.781977 |
|===========|=================|==================|===================|==================|
|    Avg    | 59373289993.400 | 38050713452.1999 | 21322576541.20000 |        35.912742 |
| non-0 Avg | 59373289993.400 | 38050713452.1999 | 21322576541.20000 |        35.912742 |
|=======================================================================================|

Here, you could see that the GraphQL layer takes up about ~36% of the total processing time. That’s huge! Also, the total time it took to execute this query is ~60 secs (a minute huh…!). You are not observing the warming up here because the 0th iteration of query-1 was the first-ever query which was executed by Host Dgraph, but since it is running continuously now, the 0th iteration of this query-2 is just another query for Host Dgraph.

Now, lets look over a memory profile for this query.

Showing nodes accounting for 9.56GB, 98.54% of 9.70GB total
Dropped 164 nodes (cum <= 0.05GB)
Showing top 30 nodes out of 104
      flat  flat%   sum%        cum   cum%
    4.38GB 45.15% 45.15%     4.38GB 45.15%  fmt.Sprintf
    1.08GB 11.12% 56.27%     2.69GB 27.73%  github.com/dgraph-io/dgraph/worker.(*queryState).handleValuePostings.func1
    0.64GB  6.64% 62.91%     0.64GB  6.64%  github.com/dgraph-io/dgraph/query.(*encoder).appendAttrs
    0.59GB  6.04% 68.95%     0.59GB  6.04%  github.com/DataDog/zstd.Decompress
    0.53GB  5.43% 74.38%     0.53GB  5.43%  github.com/dgraph-io/dgraph/worker.convertToType
    0.51GB  5.29% 79.67%     0.51GB  5.29%  github.com/dgraph-io/dgraph/worker.retrieveValuesAndFacets
    0.38GB  3.89% 83.56%     0.38GB  3.89%  github.com/dgraph-io/ristretto.newCmRow
    0.24GB  2.48% 86.04%     0.24GB  2.48%  github.com/dgraph-io/dgraph/query.(*encoder).newNode
    0.22GB  2.30% 88.34%     0.22GB  2.30%  github.com/dgraph-io/ristretto/z.(*Bloom).Size (inline)
    0.22GB  2.27% 90.61%     0.28GB  2.91%  github.com/dgraph-io/badger/v2/table.OpenTable
    0.21GB  2.12% 92.73%     0.21GB  2.12%  github.com/dgraph-io/dgraph/worker.(*queryState).handleValuePostings
    0.17GB  1.72% 94.45%     0.17GB  1.72%  github.com/dgraph-io/dgraph/query.(*arena).put
    0.16GB  1.68% 96.13%     0.16GB  1.68%  github.com/dgraph-io/badger/v2/skl.newArena
    0.07GB  0.67% 96.80%     0.07GB  0.67%  github.com/dgraph-io/dgraph/query.(*SubGraph).appendDummyValues
    0.06GB  0.65% 97.45%     0.06GB  0.65%  github.com/dgraph-io/badger/v2/table.(*Table).blockCacheKey
    0.06GB  0.57% 98.02%     0.06GB  0.57%  github.com/dgraph-io/dgraph/protos/pb.(*Posting).Unmarshal
    0.04GB  0.39% 98.42%     0.06GB  0.64%  github.com/dgraph-io/badger/v2/pb.(*TableIndex).Unmarshal
    0.01GB  0.12% 98.54%     0.72GB  7.44%  github.com/dgraph-io/badger/v2/table.(*Table).block
...

What I observed that there were no GraphQL functions in the top30. fmt.Sprintf seems to be the culprit. Almost, all the memory is being consumed by Dgraph as observed from the following diagram:

Now, lets take a look over the CPU profile for this query. The CPU was profiled for about 75 seconds, so should contain all the information for this query.

Here you can see that after rewriteAndExecute, most of the time is spent in Dgraph’s query package, while rest is spent in JSON unmarshalling. So, the GraphQL layer could try to optimize its JSON processing.

Query 3

query {
	queryDish {
		name
		price
		cuisine {
			id
			name
			restaurants {
				name
				rating
			}
		}
	}
}

It is almost the same query as the previous one, but the difference is that it starts querying Dishes at first level, while the previous query had dishes at the last level.
This one errored out saying:

"Dgraph query failed because Dgraph execution failed because encoded response size: 4294967317 is bigger than threshold: 4294967296"

Oops, there is a limit to how much you can query at a time.

Now, lets change the schema to have custom fields.

Schema 1 (`@custom` at `Restaurant.name`, mode = SINGLE)

So, here the Custom API server is gonna play a very important role. Its latency decides what will be the speed of the GraphQL layer.

Query 1

query {
	queryCuisine {
		id
		name
		restaurants {
			name
			rating
			dishes {
				name
				price
			}
		}
	}
}

I got following results:

|==========================================================================================================================================================================================================|
| Iteration | Round Trip Time (RTT) | Total Time (TT) | Trace Time (TrT) | Dgraph Time (DT) | GraphQL Time (GT=TT-DT) | % GraphQL Time (GT/TT*100) | Tracing Error (TT-TrT) | Round Trip Overhead (RTT-TT) |
|===========|=======================|=================|==================|==================|=========================|============================|========================|==============================|
|         0 |           66468568072 |     64223691374 |      64223594271 |      38363446944 |             25860244430 |                      40.27 |                  97103 |                   2244876698 |
|         1 |           63527709939 |     61308532534 |      61308460207 |      37892206197 |             23416326337 |                      38.19 |                  72327 |                   2219177405 |
|         2 |           65388871976 |     63149794335 |      63149699958 |      38047145736 |             25102648599 |                      39.75 |                  94377 |                   2239077641 |
|         3 |           65578901210 |     63216787704 |      63216717182 |      37066569187 |             26150218517 |                      41.37 |                  70522 |                   2362113506 |
|         4 |           63977765202 |     61759810694 |      61759747445 |      37922204689 |             23837606005 |                      38.60 |                  63249 |                   2217954508 |
|         5 |           62062608527 |     59772949757 |      59772881870 |      37002180761 |             22770768996 |                      38.10 |                  67887 |                   2289658770 |
|         6 |           62136762295 |     59866791492 |      59866702435 |      37046990795 |             22819800697 |                      38.12 |                  89057 |                   2269970803 |
|         7 |           61800321765 |     59591076655 |      59591017865 |      36443568456 |             23147508199 |                      38.84 |                  58790 |                   2209245110 |
|         8 |           61764738945 |     59519489701 |      59519404080 |      36426050423 |             23093439278 |                      38.80 |                  85621 |                   2245249244 |
|         9 |           62000967050 |     59780311638 |      59780217211 |      36582091037 |             23198220601 |                      38.81 |                  94427 |                   2220655412 |
|===========|=======================|=================|==================|==================|=========================|============================|========================|==============================|
|    Avg    |        63470721498.10 |  61218923588.40 |   61218844252.40 |   37279245422.50 |          23939678165.90 |                      39.11 |               79336.00 |                2251797909.70 |
|==========================================================================================================================================================================================================|

The same query without @custom was taking approx ~60 secs of time and now it is taking ~61 secs. And on average GraphQL layer has started taking 3% more time as compared to the non-custom version. Not much of a difference.

But, as I said. The Custom API server plays a very important role. Once I was just logging the request parameters in the Custom API server, and then more than half of the custom requests were erroring out saying something like:

dial tcp 127.0.0.1:9000: socket: too many open files

and this:

dial tcp: lookup localhost: device or resource busy

Sorry, I was blaming logging for the above errors, but I found out the reason was different. Actually, I had tried custom queries on different machines. When I tried on a machine with ulimit -n = 1024 I got these errors. But, when I tried the requests on a machine with ulimit -n = 1000000, there were no errors. So, the limit of how many file descriptors can be open at a time, decides how many concurrent HTTP requests one can make.

Later, We figured, that we were using a new HTTP client for every custom API request, which we should ideally share among all the requests. Another thing that came from logging in the Custom API server was a bug, I figured that the URL wasn’t being formed properly for all the requests. Actually, all the requests were using the same URL to query the restaurant name for different nodes, while I had specified the restaurant ID in the URL as a query parameter. That meant a bug because the restaurant ID wasn’t changing in the URL for different requests for different restaurants

Schema 2 (`@custom` at `Restaurant.name` and `Restaurant.rating`, mode = SINGLE)

For the same query we did in previous case, I got following results:

|==========================================================================================================================================================================================================|
| Iteration | Round Trip Time (RTT) | Total Time (TT) | Trace Time (TrT) | Dgraph Time (DT) | GraphQL Time (GT=TT-DT) | % GraphQL Time (GT/TT*100) | Tracing Error (TT-TrT) | Round Trip Overhead (RTT-TT) |
|===========|=======================|=================|==================|==================|=========================|============================|========================|==============================|
|         0 |           88340502908 |     86094066348 |      86093973524 |      38419343402 |             47674722946 |                      55.38 |                  92824 |                   2246436560 |
|         1 |           66210222248 |     63739985080 |      63739875323 |      38735287864 |             25004697216 |                      39.23 |                 109757 |                   2470237168 |
|         2 |           67868325501 |     65642375222 |      65642304409 |      37966999610 |             27675375612 |                      42.16 |                  70813 |                   2225950279 |
|         3 |           65222138193 |     62868024268 |      62867926384 |      37427666711 |             25440357557 |                      40.47 |                  97884 |                   2354113925 |
|         4 |          108358242220 |    106126197977 |     106126118297 |      37468178815 |             68658019162 |                      64.69 |                  79680 |                   2232044243 |
|         5 |          110108253427 |    107876266217 |     107876169996 |      37338939419 |             70537326798 |                      65.39 |                  96221 |                   2231987210 |
|         6 |           88563333556 |     86352756552 |      86352660161 |      36875922098 |             49476834454 |                      57.30 |                  96391 |                   2210577004 |
|         7 |           66925765457 |     64596692549 |      64596627767 |      36615986643 |             27980705906 |                      43.32 |                  64782 |                   2329072908 |
|         8 |           65797357895 |     63548888709 |      63548823186 |      36605283969 |             26943604740 |                      42.40 |                  65523 |                   2248469186 |
|         9 |           63025225363 |     60794992442 |      60794929244 |      36301763763 |             24493228679 |                      40.29 |                  63198 |                   2230232921 |
|===========|=======================|=================|==================|==================|=========================|============================|========================|==============================|
|    Avg    |        79041936676.80 |  76764024536.40 |   76763940829.10 |   37375537229.40 |          39388487307.00 |                      51.31 |               83707.30 |                2277912140.40 |
|==========================================================================================================================================================================================================|

All timing information is in nanoseconds, except '% GraphQL Time' which is a percentage.

So, as the number of custom fields increased, the time taken by GraphQL layer also increased.

Schema 3 (@custom at `Restaurant.name`, mode = BATCH)

Again, lets just see the results for the same query:

|==========================================================================================================================================================================================================|
| Iteration | Round Trip Time (RTT) | Total Time (TT) | Trace Time (TrT) | Dgraph Time (DT) | GraphQL Time (GT=TT-DT) | % GraphQL Time (GT/TT*100) | Tracing Error (TT-TrT) | Round Trip Overhead (RTT-TT) |
|===========|=======================|=================|==================|==================|=========================|============================|========================|==============================|
|         0 |           61611446863 |     59388103422 |      59388014334 |      37980426748 |             21407676674 |                      36.05 |                  89088 |                   2223343441 |
|         1 |           61400095203 |     59144124108 |      59144044408 |      38121722403 |             21022401705 |                      35.54 |                  79700 |                   2255971095 |
|         2 |           60730566946 |     58481387083 |      58481287646 |      37289418977 |             21191968106 |                      36.24 |                  99437 |                   2249179863 |
|         3 |           60308483130 |     58061209969 |      58061143124 |      36818796463 |             21242413506 |                      36.59 |                  66845 |                   2247273161 |
|         4 |           59732856606 |     57293922185 |      57293775419 |      36557777717 |             20736144468 |                      36.19 |                 146766 |                   2438934421 |
|         5 |           59283153824 |     57083817945 |      57083637255 |      36194470147 |             20889347798 |                      36.59 |                 180690 |                   2199335879 |
|         6 |           59949373049 |     57578450668 |      57578345310 |      36720538461 |             20857912207 |                      36.23 |                 105358 |                   2370922381 |
|         7 |           58881320406 |     56663119564 |      56663014958 |      35874127011 |             20788992553 |                      36.69 |                 104606 |                   2218200842 |
|         8 |           59412697511 |     57159098940 |      57158987791 |      36144347064 |             21014751876 |                      36.77 |                 111149 |                   2253598571 |
|         9 |           59119524162 |     56902921466 |      56902817922 |      35784036763 |             21118884703 |                      37.11 |                 103544 |                   2216602696 |
|===========|=======================|=================|==================|==================|=========================|============================|========================|==============================|
|    Avg    |        60042951770.00 |  57775615535.00 |   57775506816.70 |   36748566175.40 |          21027049359.60 |                      36.39 |              108718.30 |                2267336235.00 |
|==========================================================================================================================================================================================================|

All timing information is in nanoseconds, except '% GraphQL Time' which is a percentage.

Here, the time taken by GraphQL layer is similar to the one observed without @custom directive, but anyways there is only a improvement of 3% as compared to the SINGLE mode.

Schema 4 (@custom at `Restaurant.name` & `Restaurant.rating`, mode = BATCH)

|==========================================================================================================================================================================================================|
| Iteration | Round Trip Time (RTT) | Total Time (TT) | Trace Time (TrT) | Dgraph Time (DT) | GraphQL Time (GT=TT-DT) | % GraphQL Time (GT/TT*100) | Tracing Error (TT-TrT) | Round Trip Overhead (RTT-TT) |
|===========|=======================|=================|==================|==================|=========================|============================|========================|==============================|
|         0 |           61590554058 |     59210474563 |      59210381248 |      38093553773 |             21116920790 |                      35.66 |                  93315 |                   2380079495 |
|         1 |           61299109335 |     59035276660 |      59035206298 |      37783346890 |             21251929770 |                      36.00 |                  70362 |                   2263832675 |
|         2 |           60732413641 |     58516064639 |      58515990039 |      37338295939 |             21177768700 |                      36.19 |                  74600 |                   2216349002 |
|         3 |           60374842427 |     58140427294 |      58140338517 |      37068236579 |             21072190715 |                      36.24 |                  88777 |                   2234415133 |
|         4 |           59760191150 |     57254263659 |      57254134245 |      36534692653 |             20719571006 |                      36.19 |                 129414 |                   2505927491 |
|         5 |           59822598677 |     57543582556 |      57543486696 |      36527421680 |             21016160876 |                      36.52 |                  95860 |                   2279016121 |
|         6 |           60045091331 |     57603337964 |      57603235492 |      36697616674 |             20905721290 |                      36.29 |                 102472 |                   2441753367 |
|         7 |           59158758936 |     56946279431 |      56946174635 |      36059726763 |             20886552668 |                      36.68 |                 104796 |                   2212479505 |
|         8 |           59625460560 |     57388379488 |      57388291162 |      36524422036 |             20863957452 |                      36.36 |                  88326 |                   2237081072 |
|         9 |           59467452018 |     57219823067 |      57219758976 |      36228289972 |             20991533095 |                      36.69 |                  64091 |                   2247628951 |
|===========|=======================|=================|==================|==================|=========================|============================|========================|==============================|
|    Avg    |        60187647213.30 |  57885790932.10 |   57885699730.80 |   36885560295.90 |          21000230636.20 |                      36.28 |               91201.30 |                2301856281.20 |
|==========================================================================================================================================================================================================|

Here, the time is still comparable to the non-custom version. And if compared with the custom version, there is a great improvement of ~15%. So, it seems BATCH mode is best for usage with @custom because it makes very less external HTTP API calls as compared to the SINGLE mode, resulting in better performance.

Overall, I ran benchmarks for a lot of different queries, returning data in different sizes, and this is what I observed:

|=================================================|
|      Schema type        |  % GraphQL layer time |
|=========================|=======================|
| Simple Schema           |          14-36%       |
| Custom with SINGLE mode |          40-96%       |
| Custom with BATCH mode  |          36-70%       |
|=================================================|

I am gonna keep enriching this post over time with more data, as it keeps coming.

If you are interested in running these benchmarks yourselves, then have a look at this PR.

amaster507 · July 23, 2020, 9:12pm

I would be interested in adding the following to your schema:

type Owner @auth({ query: { rule: "query($USERNAME: String!) { queryOwner(filter: {username: { eq: $USERNAME } }) { username } }" } }) {
  username: String! @id
  hasRestaurants: [Restaurant] @hasInverse(field: owner)
}

type Country @auth({ query: { rule: "query($USERNAME: String!) { queryCountry { cities { restaurants { restaurant { owner(filter: {username: { eq: $USERNAME } }) { username } } } } } }" } }) { ... }

type City @auth({ query: { rule: "query($USERNAME: String!) { queryCity {  restaurants { restaurant { owner(filter: {username: { eq: $USERNAME } }) { username } } } } }" } }) { ... }

...

type RestaurantAddress implements Location @auth({ query: { rule: "query($USERNAME: String!) { queryRestaurantAddress { restaurant { owner(filter: {username: { eq: $USERNAME } }) { username } } } }" } }) { ... }

type Restaurant @auth({ query: { rule: "query($USERNAME: String!) { queryRestaurant { owner(filter: {username: { eq: $USERNAME } }) { username } } }" } }) {
  ...
  owner: Owner # after fixing adding Owners and linking them make this a required field. !
}

type Cuisine @auth({ query: { rule: "query($USERNAME: String!) { queryCuisine { restaurants { owner(filter: {username: { eq: $USERNAME } }) { username } } } }" } }) { ... }

type Dish @auth({ query: { rule: "query($USERNAME: String!) { queryDish { servedBy { owner(filter: {username: { eq: $USERNAME } }) { username } } } }" } }) { ... }

# Dgraph.Authorization {"VerificationKey":"secret-key","Header":"auth","Namespace":"https://myapp.io/jwt/claims","Algo":"HS256"}

Randomly assign the Restaurants to 10 different owners.

This essentially locks every piece of data down to only what the owner has access to through his linked Restaurant(s). Still allows full add/update/delete mutations though.

Then compare the original results to results when passing a token with the USERNAME in it.

arijit · July 24, 2020, 7:34am

I will accomodate the above mentioned changes once I have numbers for deep mutation.

arijit · July 28, 2020, 3:41am

Mutation Benchmark
BenchmarkMutation: 100 owners, each having 100 restaurants
BenchmarkMultiLevelMutation: 20 restaurants, each having 20 cuisines, each cuisine having 20 dishes
BenchmarkOneLevelMutation: 10000 nodes

Auth:
BenchmarkMutation: 0.380893400s
BenchmarkMultiLevelMutation: 1.392922056s
BenchmarkOneLevelMutation:
Add Time: 9.42224304s
Delete Time: 1.150111483s

Non-Auth:
BenchmarkMutation: 0.464559706s
BenchmarkMultiLevelMutation: 1.440681796s
BenchmarkOneLevelMutation:
Add Time: 9.549761333s
Delete Time: 1.200276696s

Mutation Heap profile:
Most of the memory is used by the Dgraph Layer

Mutation CPU profile:(15 secs)
Most of the time is spent in the Dgraph Layer

arijit · July 28, 2020, 3:43am

GraphQL pre and post processing time:
Auth:

Benchmark Name                   | Pre Time      | Post Time  | Ratio of Processing Time by Actual Time
BenchmarkNestedQuery               144549ns        1410978ns     0.14%
BenchmarkOneLevelMutation          29422440ns      113091520ns   3.31%
BenchmarkMultiLevelMutation        19717340ns      7690352ns     2.24%

Non-Auth:

Benchmark Name                   | Pre Time      | Post Time  | Ratio of Processing Time by Actual Time
BenchmarkNestedQuery               117319ns        716261089ns    26.65%
BenchmarkOneLevelMutation          29643908ns      83077638ns     2.6%
BenchmarkMultiLevelMutation        20579295ns      53566488ns     6.2%

pawan · July 28, 2020, 12:36pm

Summarized Results

Mutations

add/update mutations - Most of the time and memory is spent while executing the mutation inside Alpha. This means that GraphQL mutations add very little overhead over GraphQL± mutations
@arijit is supposed to post the ratio of (pre + post processing time) to the total time spent executing the mutation.
add/update mutations with Auth - The execution time for mutations with and without auth rules is very similar which means auth rules don’t add much overhead. In some cases the execution time with auth rules is lesser because we end up fetching lesser data.

Queries

query - GraphQL queries can end up taking 14-36% of the total execution time doing post-processing and applying GraphQL constraints. The percentage is higher for queries that return a large amount of data. A lot of memory and time is consumed while JSON unmarshalling the result of the GraphQL± query from Alpha and transforming it (completeObject) and then json marshaling it back(WriteTo above). There is scope for improvement here and we have added this to investigate in our backlog.
queries with custom fields - There are two cases here, single and batch mode.

abhimanyusinghgaur:

|=================================================|
|      Schema type        |  % GraphQL layer time |
|=========================|=======================|
| Simple Schema           |          14-36%       |
| Custom with SINGLE mode |          40-96%       |
| Custom with BATCH mode  |          36-70%       |
|=================================================|

We have discovered a couple of issues that we have added to our backlog.

too many file descriptor error - This comes when we make too many concurrent requests. We should look into how can we avoid that error.
bug with id not being substituted properly while making requests to fetch a custom field in single-mode - this has been fixed already by @abhimanyusinghgaur

mrjn · July 28, 2020, 7:12pm

Ideally, you can add the GraphQL non-nullable and other logic within fastJson JSON generator. That way, you only create the JSON once, instead of creating it, parsing it, and then re-creating it.

Topic		Replies	Views
Benchmarking GraphQL performance Dev	10	3837	October 20, 2021
Graphql Benchmark with Locust GraphQL benchmark	4	2442	October 27, 2020
Query optimization to reduce query time and memory usage Dgraph	4	412	July 25, 2020
How to imporve query performance and be avoid from out of memory Dgraph	4	386	July 1, 2023
Certain GraphQL Queries Hang GraphQL	3	1126	May 13, 2021

GraphQL query/mutation benchmarking result

Creating the Dataset

Benchmarking

Initial Schema (No custom directive)

Query 1

Query 2

Query 3

Schema 1 (@custom at Restaurant.name, mode = SINGLE)

Query 1

Schema 2 (@custom at Restaurant.name and Restaurant.rating, mode = SINGLE)

Schema 3 (@custom at Restaurant.name, mode = BATCH)

Schema 4 (@custom at Restaurant.name & Restaurant.rating, mode = BATCH)

Summarized Results

Mutations

Queries

Related topics

Schema 1 (`@custom` at `Restaurant.name`, mode = SINGLE)

Schema 2 (`@custom` at `Restaurant.name` and `Restaurant.rating`, mode = SINGLE)

Schema 3 (@custom at `Restaurant.name`, mode = BATCH)

Schema 4 (@custom at `Restaurant.name` & `Restaurant.rating`, mode = BATCH)