I’m currently working on a project where we’re using Dgraph as our database, and I’m looking for some advice on optimizing query performance.
We’ve been experiencing some performance issues with our queries, and I’m wondering if there are any best practices or tips you can share to improve efficiency. Here’s a brief overview of our setup and the issues we’re facing:
Dgraph Version: v21.03
Data Model: We have a graph with several nodes and edges, and some of our queries are quite complex.
Problem: Some of our queries are slower than expected, especially when dealing with larger datasets. We’ve also noticed that performance varies significantly depending on the time of day.
Here are a few specific questions I have:
Indexing: Are there any recommended strategies for indexing in Dgraph to speed up query performance?
Query Optimization: What are the best practices for writing efficient Dgraph queries? Are there any common pitfalls to avoid?
Caching: Does Dgraph support caching mechanisms that can help improve performance? If so, how can I implement them?
Dgraph does not have a query optimizer, so this is all put on the user. I do not use the GraphQL end point, so this advice is specific to DQL. This is not a technical spec on how to optimize query performance, these are just some techniques I’ve picked up while optimizing Dgraph queries.
I’m going to use examples with the following schema:
User {
name: string
}
TimeEntry {
owner: User
startTime: DateTime
endTime: DateTime
TimeSheet {
date: DateTime
owner: User
timeEntries: [TimeEntry]
}
The above query has to evaluate all TimeSheets multiple times because the “type” function is used in the root and there’s a single query block with multiple filters (Dgraph runs this as two queries with a single filter and then finds where the results intersect).
Rather than start from all TimeSheets and filter down, it starts with the known User and traverses the graph to their TimeSheets. It then evaluates the TimeSheets once with the date filter.
If there are multiple filters in your final query, break each out into it’s own query block and run the filters in order of most performant to least performant. This will reduce the dataset more quickly so the least performant filters are applied to the smallest dataset possible. I’ve found DateTime filters to have the worst performance.