DQL: Querying Subgraph at Arbitrary Depth

Current Approach

I’m currently using the following DQL query to fetch a category and its subcategories recursively:

query categories($path: string) {
  var(func: eq(Category.name, $path)) {
    cat as uid
  }
  categories(func: uid(cat)) @recurse(depth: 10) {
    uid
    Category.name
    Category.subcategories
  }
}

With variables:

{
  "$path": "Science"
}

This works well for a single category, but I want to extend it to handle arbitrary depths of categorisation.

Desired Functionality

I want to be able to query the subgraph starting from an arbitrary depth in the category hierarchy. For example, I’d like to be able to specify paths like:

  1. ["Science"]
"categories": [
            {
                "uid": "0xc392",
                "Category.name": "Science",
                "Category.subcategories": [
                    {
                        "uid": "0xc390",
                        "Category.name": "Physics",
                        "Category.subcategories": [
                            {
                                "uid": "0xc39b",
                                "Category.name": "Quantum Mechanics",
                                "Category.subcategories": [
...
  1. ["Science", "Physics"]
"categories": [
         "uid": "0xc390",
          "Category.name": "Physics",
          "Category.subcategories": [
               {
                   "uid": "0xc39b",
                   "Category.name": "Quantum Mechanics",
                   "Category.subcategories": [
...
  1. ["Science", "Physics", "Quantum"]

Or alternatively, using a path-like string:

"/Science/Physics/Quantum"

To determine the starting point

Questions

  1. Is there a way to modify my current query to accept an array or path-like string of category names and traverse the graph from that point?

  2. How can I efficiently query a subgraph starting from an arbitrary level of nesting without having to materialize the full path on each category node?

  3. Are there any Dgraph-specific features or best practices for handling hierarchical data queries like this?

  4. What would be the most performant way to implement this kind of flexible depth querying, especially for large category trees?

  5. Are there any limitations or considerations I should be aware of when implementing this kind of query, particularly regarding query complexity or depth limits?

I’m aiming to create a flexible querying mechanism that allows me to start at any point in my category hierarchy and retrieve the subgraph from that point onwards. This should work efficiently regardless of the depth of the starting point in the overall category tree.

Any insights, sample queries, or best practices would be greatly appreciated. Thank you in advance for your help!

P.S.: Sub-graph query Possibly related but it hard-codes a specific nesting depth, and I need it to be dynamic so I can populate an UI.

EDIT: Implementing Filesystem like Graph System. How to fully get back tree without writing a ton of graphql code? is related but lacks an answer

Could you use an id field of type string and set the id of each category using your “path” notation?
You would have id = “Science/Physics/Quantum Mechanics” for the node “Quantum Mechanics”.

That way you can “start” from any node

var(func: eq(Category.id, $path)) {
    cat as uid
  }

with
$path = “Science/Physics/Quantum Mechanics”
should make you start where you need.

Am I missing something?

Hey @Raphael

$path = “Science/Physics/Quantum Mechanics”
should make you start where you need.

That’s what I meant by materialising the full path in my original post:

How can I efficiently query a subgraph starting from an arbitrary level of nesting without having to materialize the full path on each category node?

And yes, it’s problematic because you then need to maintain that and ensure consistency, which is less than ideal, because anything that can break will break.

I believe other languages such as cypher would allow me to do this easily.

So I expect to be able to query subgraphs like I proposed on a graph DB without having to materialize data, at that point I’d be using SQL, no?

Maybe it’s me who is missing something :slight_smile: I guess Dgraph has tools to keep materialized views more in check with hooks and lambdas, I haven’t played with it, feels a bit too complex, open to explore it, but I think this functionality is a must regardless.

I see your point. In similar use cases, I have used 2 approaches

  • use materialized path and use a script to maintain the path by regularly traversing the path and updating the materialized names. I agree that this is only for “stable” trees that are not changing. I have used that, for example, for a date-tree where I have year-quarter-month-week-dayofweek-day for analytics use case on events. This is a ‘stable’ tree that is not modified, so each node can have an id with the complete path.
  • second option is to build the query in your client.
    The logic would be to start with
c as Category.subcategories @filter(eq(Category.name,"Quantum Mechanics")) 

or whatever the final category is ( the last of the array) and encapsulate it to dynamically build the query for complete path using Category.subcategories @filter(eq(Category.name,"xyz")) { <> } recursively and use var(func:eq(Category.name,"xyz")) { <> } for the first element of the path.
This would lead a query like:

{
  var(func:eq(Category.name,"Science")) {
      Category.subcategories @filter(eq(Category.name,"Physics")) {
        c as Category.subcategories @filter(eq(Category.name,"Quantum Mechanics")) 
      }
  }
  mysearch(func:uid(c)) {
    <whatever you need starting from the category node>
  }
}

Building DQL dynamically is a powerful approach that will cover arbitrary depth.
But you have more work on the client side!

I don’t see a way to do it with a simple DQL query with the current version, but I’ll have a deeper look.
If you know Cypher, I’m curious to see the cypher query that can do that with an arbitrary path array, it could give us ideas to implement a “search by dynamic path” function to get nodes.