Advice Needed on Optimizing Dgraph Query Performance

Hi Dgraph Community,

I’m currently working on a project where we’re using Dgraph as our database, and I’m looking for some advice on optimizing query performance.

We’ve been experiencing some performance issues with our queries, and I’m wondering if there are any best practices or tips you can share to improve efficiency. Here’s a brief overview of our setup and the issues we’re facing:

  • Dgraph Version: v21.03
  • Data Model: We have a graph with several nodes and edges, and some of our queries are quite complex.
  • Problem: Some of our queries are slower than expected, especially when dealing with larger datasets. We’ve also noticed that performance varies significantly depending on the time of day.

Here are a few specific questions I have:

  1. Indexing: Are there any recommended strategies for indexing in Dgraph to speed up query performance?
  2. Query Optimization: What are the best practices for writing efficient Dgraph queries? Are there any common pitfalls to avoid?
  3. Caching: Does Dgraph support caching mechanisms that can help improve performance? If so, how can I implement them?

I found these resources/articles What is Query Optimization in Graph Databases? Techniques and Strategies - Dgraph Bloghow to become an aws devops engineer and as per them I’ve already tried some basic optimizations, such as reducing the complexity of the queries and ensuring that we’re using indexes where appropriate, but we’re still not seeing the improvements we’d like.

Any insights or suggestions from your experiences would be greatly appreciated!

Thanks in advance for your help.

Best Regards

Hi @laylamalik,

Dgraph does not have a query optimizer, so this is all put on the user. I do not use the GraphQL end point, so this advice is specific to DQL. This is not a technical spec on how to optimize query performance, these are just some techniques I’ve picked up while optimizing Dgraph queries.

I’m going to use examples with the following schema:

User {
  name: string
}

TimeEntry {
  owner: User
  startTime: DateTime
  endTime: DateTime

TimeSheet {
  date: DateTime
  owner: User
  timeEntries: [TimeEntry]
}

Bad query example:

query {
  TimeSheet(func: type(TimeSheet)) @filter(between(date, $start, $end) and uid_in(owner, $user)) {
    uid
    owner {
      name
    }
    timeEntries {
      uid
      startTime
      endTime
    }
  } 
}

The above query has to evaluate all TimeSheets multiple times because the “type” function is used in the root and there’s a single query block with multiple filters (Dgraph runs this as two queries with a single filter and then finds where the results intersect).

Better query example:

query {
  var(func: uid($user)) {
    ownerTimeSheets as ~owner @filter(type(TimeSheet))
  }

  ownerDateTimeSheet as var(func: uid(ownerTimeSheets)) @filter(between(date, $start, $end))

  TimeSheet(func: uid(ownerDateTimeSheet)) {
    uid
    owner {
      name
    }
    timeEntries {
      uid
      startTime
      endTime
    }
  } 
}

Rather than start from all TimeSheets and filter down, it starts with the known User and traverses the graph to their TimeSheets. It then evaluates the TimeSheets once with the date filter.

If there are multiple filters in your final query, break each out into it’s own query block and run the filters in order of most performant to least performant. This will reduce the dataset more quickly so the least performant filters are applied to the smallest dataset possible. I’ve found DateTime filters to have the worst performance.

Hope this helps some!