Recursive query not expanding

pjebs commented :

My current workaround is: lemma-chain/find_chain.go at master · thehonestscoop/lemma-chain · GitHub

Is this not good enough?

pjebs commented :

Can I use Max uint64 value?

pawanrawal commented :

Yeah, you can use MaxUint64 value but should use a lower value if you have an idea of the max expected depth. The problem with using MaxUint64 is that if your data does have loops, you’ll end up getting a lot of results (capped by query_edge_limit flag of alpha) which you don’t care about.

campoy commented :

Ok, so I do not think this is a bug at all. Let me explain.

When setting loop: false we’re asking Dgraph to stop expanding after a node that has already been seen is found.

This allows us to find loops by giving us the element that we already found. This repeated element is expanded, but no further elements will be expanded.

Let’s see an example!

Given this dataset:

{
    set {
        _:a <name> "a" .
        _:b <name> "b" .
        _:c <name> "c" .
        _:d <name> "d" .
        _:z <name> "z" .

        _:a <id> "1"^^<xs:int> .
        _:b <id> "2"^^<xs:int> .
        _:c <id> "3"^^<xs:int> .
        _:d <id> "4"^^<xs:int> .

        _:a <knows> _:b .
        _:b <knows> _:c .
        _:c <knows> _:a .
        _:c <knows> _:d .

        _:a <type> _:z .
        _:b <type> _:z .
        _:c <type> _:z .
        _:d <type> _:z .  
    }
}

This draws a graph as follows:

As you can see, a, b, and c form a loop. Also a, b, c, and d point to z with their “type” edges.

@recurse(loop: true)

Let’s recurse starting from a using name, knows, id, and type with the following query:

note: the name predicate has an exact predicate <name>: string @index(exact) .

{
  q(func: eq(name, "a")) @recurse(loop: true, depth: 5) {
    name 
    id
    knows
    type
  }
}

The result is:

{
  "data": {
    "q": [
      {
        "name": "a",
        "id": 1,
        "knows": [
          {
            "name": "b",
            "id": 2,
            "knows": [
              {
                "name": "c",
                "id": 3,
                "knows": [
                  {
                    "name": "a",
                    "id": 1,
                    "knows": [
                      {
                        "name": "b",
                        "id": 2
                      }
                    ],
                    "type": [
                      {
                        "name": "z"
                      }
                    ]
                  },
                  {
                    "name": "d",
                    "id": 4,
                    "type": [
                      {
                        "name": "z"
                      }
                    ]
                  }
                ],
                "type": [
                  {
                    "name": "z"
                  }
                ]
              }
            ],
            "type": [
              {
                "name": "z"
              }
            ]
          }
        ],
        "type": [
          {
            "name": "z"
          }
        ]
      }
    ]
  },
  "extensions": { }
}

This contains two paths a - b - c - a - b and a - b - c - d, and for each of these nodes all the predicates (name, id, knows, type) are expanded.

Notice that while the ending node d still has likes expanded to the name z.
But the end of the path with the loop b does not expand type but it does expand name and id. Why is this?

The behavior is actually the same for name and type, but while expanding name gives a value predicate of type string, type goes to another uid for which we don’t expand anything anymore since we’ve reached the maximum recursion depth. Since the expansion doesn’t occur, the object inside of type (and knows) is empty and therefore the predicate type is dropped completely.

@recurse(loop: false)

A very similar behavior occurs when setting loop: false as the last node to be expanded is the one that was already included in the traversal. In this case, that’s a.

{
  q(func: eq(name, "a")) @recurse(loop: false) {
    name 
    id
    knows
    type
  }
}

This will return:

{
  "data": {
    "q": [
      {
        "name": "a",
        "id": 1,
        "knows": [
          {
            "name": "b",
            "id": 2,
            "knows": [
              {
                "name": "c",
                "id": 3,
                "knows": [
                  {
                    "name": "a",
                    "id": 1
                  },
                  {
                    "name": "d",
                    "id": 4,
                    "type": [
                      {
                        "name": "z"
                      }
                    ]
                  }
                ],
                "type": [
                  {
                    "name": "z"
                  }
                ]
              }
            ],
            "type": [
              {
                "name": "z"
              }
            ]
          }
        ],
        "type": [
          {
            "name": "z"
          }
        ]
      }
    ]
  },
  "extensions": { }
}

Here you can see that while d is expanded and you can see the name for the node pointed by type, a is expanded but the nodes pointed by it via knows and type will not be expanded, generating an empty object, which will be removed as well as the predicates knows and type themselves. This is not different to the behavior with b when the maximum recursion depth in the case above.

Coming back to your example

The behavior above matches what you see in your example. The first time you see the node is the first element in the loop and we expand it until we find it again, in which case we stop the expansion at that node (value predicates will appear, but object predicates will be expanded but empty therefore dropped).

I hope this helps understand the behavior.

I do no think this is a bug, although it might be useful to improve the documentation regarding the behavior of @recurse when loop: false is set and a loop is found or the maximum recursion depth is reached.

pjebs commented :

What you are saying is correct. What I suggested in recursive query not expanding · Issue #3634 · dgraph-io/dgraph · GitHub is that it may very well be working as intended - with the obvious improvement being to clarify the behaviour in documentation.

I am saying that the intended bahviour is not consistent with what a user would expect from such a query such as:

q = `
		query withvar($hashid: string) {
			chain(func: eq(node.hashid, $hashid)) @recurse(loop:false) {
				uid
				node.owner
				user.name
				node.hashid
				node.xdata
				node.parent @facets @facets(eq(facet, "required") or eq(facet, "recommended"))
			}
		}
	`

I expect each result to satisfy the “footprint” as stated in the query of uid, node.owner, user.name, node.hashid, node.xdata and most importantly node.parent if available.

Currently if the parent node has been visited, it is not displaying it. It should return that value. Currently I am doing it client side, when it can be very easily performed in server-side. @pawanrawal can probably explain it better than I.

Further to this, if the node has been visited before, then node.parent is not returned. I then have no ability to know if that node has a parent. This is obviously an error.

campoy commented :

I wonder if it would make sense for you to perform this query in a slightly different way then.

You could have the recursion to identify what are the nodes involved, and a different query in the same block to obtain the data related to each.

So following the dataset I presented above, instead of the query right below for which you’d be missing the type information on nodes already visited:

{
  q(func: eq(name, "a")) @recurse(loop: false) {
    name 
    id
    knows
    type
  }
}

You could fetch the data in two queries as such:

{
  tree(func: eq(name, "a")) @recurse(loop: false) {
    nodes as uid
    knows
  }
  
  nodes(func: uid(nodes)) {
    name
    id
    type {
      name
    }
  }
}

This would return both the structure of your tree and the information attached to each node (only once).
The result would be a smaller response, probably.

{
  "data": {
    "tree": [
      {
        "uid": "0x3",
        "knows": [
          {
            "uid": "0x4",
            "knows": [
              {
                "uid": "0x5",
                "knows": [
                  {
                    "uid": "0x1"
                  },
                  {
                    "uid": "0x3"
                  }
                ]
              }
            ]
          }
        ]
      }
    ],
    "nodes": [
      {
        "name": "d",
        "id": 4,
        "type": [
          {
            "name": "z"
          }
        ]
      },
      {
        "name": "a",
        "id": 1,
        "type": [
          {
            "name": "z"
          }
        ]
      },
      {
        "name": "b",
        "id": 2,
        "type": [
          {
            "name": "z"
          }
        ]
      },
      {
        "name": "c",
        "id": 3,
        "type": [
          {
            "name": "z"
          }
        ]
      }
    ]
  },
  "extensions": { ... }
}

We will update the documentation, but first I want to make sure this solution meets your needs.

campoy commented :

After discussing this issue with @pawanrawal we’ve reached the conclusion that there’s a possible enhancement to be done here which would target your concerns.

I’m going to keep this issue open but change it to be an enhancement, as the current behavior matches the specs. That said, we’ll try to implement this in future releases.

The solution would be to stop recursion once a node has been seen in the current path. This is different to the current behavior where a node is not expanded once it has been seen at any point in the traversal.

Let’s see this with one example:

{
    set {
        _:a <name> "a" .
        _:b <name> "b" .
        _:c <name> "c" .
        _:d <name> "d" .
        _:e <name> "e" .
        _:f <name> "f" .

      	_:z <name> "z" .

        _:f <knows> _:a .
        _:f <knows> _:d .

      	_:d <knows> _:e .
        _:e <knows> _:a .

        _:a <knows> _:b .
        _:a <knows> _:e .
        _:b <knows> _:c .

        _:a <type> _:z .
        _:b <type> _:z .
        _:c <type> _:z .
        _:d <type> _:z .  
        _:e <type> _:z .
        _:f <type> _:z .
    }
}

This would generate a graph like this:


note: the type predicates have been omitted for clarity’s sake

If you got to start a recursive query from f:

{
    q(func: eq(name, "f")) @recurse {
        name
        knows
        type
    }
}

Currently, you would get something like this:

{
  "data": {
    "q": [
      {
        "knows": [
          {
            "knows": [
              {
                "knows": [
                  {
                    "name": "c",
                    "type": [
                      {
                        "name": "z"
                      }
                    ]
                  }
                ],
                "name": "b",
                "type": [
                  {
                    "name": "z"
                  }
                ]
              }
            ],
            "name": "a",
            "type": [
              {
                "name": "z"
              }
            ]
          },
          {
            "knows": [
              {
                "knows": [
                  {
                    "name": "a"
                  }
                ],
                "name": "e"
              }
            ],
            "name": "d",
            "type": [
              {
                "name": "z"
              }
            ]
          }
        ],
        "name": "f",
        "type": [
          {
            "name": "z"
          }
        ]
      }
    ]
  }
}

Note how once we reached a in the path starting with f -> d, the recursion stops.

We propose enhancing the loop detection to keep on going as a has not been seen in that path yet, so the output would look like the following:

{
  "data": {
    "q": [
      {
        "knows": [
          {
            "knows": [
              {
                "knows": [
                  {
                    "name": "c",
                    "type": [
                      {
                        "name": "z"
                      }
                    ]
                  }
                ],
                "name": "b",
                "type": [
                  {
                    "name": "z"
                  }
                ]
              }
            ],
            "name": "a",
            "type": [
              {
                "name": "z"
              }
            ]
          },
          {
            "knows": [
              {
                "knows": [
                  {
                    "knows": [
                      {
                        "knows": [
                          {
                            "name": "c",
                            "type": [
                              {
                                "name": "z"
                              }
                            ]
                          }
                        ],
                        "name": "b",
                        "type": [
                          {
                            "name": "z"
                          }
                        ]
                      }
                    ],
                    "name": "a",
                    "type": [
                      {
                        "name": "z"
                      }
                    ]
                  }
                ],
                "name": "e"
              }
            ],
            "name": "d",
            "type": [
              {
                "name": "z"
              }
            ]
          }
        ],
        "name": "f",
        "type": [
          {
            "name": "z"
          }
        ]
      }
    ]
  }
}

I think this would address your concerns, do you agree? @pjebs @power-f-GOD
Thanks for the report and for helping us understand your needs!

pjebs commented :

I’ll have a look at it when I get back to this project. I still think that the queries return “footprint” should always be returned when possible.