Recursive query not expanding

Moved from GitHub dgraph/3634

Posted by pjebs:

This is the same query/schema/data as BUG: facets returning array · Issue #3582 · dgraph-io/dgraph · GitHub

I believe it is an independent issue but not sure: @martinmr

For the query:

q = `
		query withvar($hashid: string) {
			chain(func: eq(node.hashid, $hashid)) @recurse(loop:false) {
				uid
				node.owner
				user.name
				node.hashid
				node.xdata
				node.parent @facets @facets(eq(facet, "required") or eq(facet, "recommended"))
			}
		}
	`

When a node reappears deeper in, it is not expanding the node.owner. It correctly expands in earlier on.

pjebs commented :

See <------------------------ below

{
  "chain": [
    {
      "uid": "0x4e27",
      "node.owner": [
        {
          "uid": "0x4e21",
          "user.name": "johny"
        }
      ],
      "node.hashid": "jjtn4cdirv",
      "node.xdata": "{\"title\":\"The Two Towers\",\"authors\":[\"J. R. R. Tolkien\"]}",
      "node.parent": [
        {
          "uid": "0x2711", <----------------------------
          "node.owner": [ <--------------- EXPANDED
            {
              "uid": "0x1",
              "user.name": "powerofgod"
            }
          ],
          "node.hashid": "myin9aksq",    
          "node.xdata": "{\"title\":\"The Da Vinci Code\",\"authors\":[\"Dan Brown\"]}",
          "node.parent|facet": "required"
        },
        {
          "uid": "0x4e25",
          "node.owner": [
            {
              "uid": "0x1",
              "user.name": "powerofgod"
            }
          ],
          "node.hashid": "r9t9rc4ip",
          "node.xdata": "{\"title\":\"Pride and Prejudice\",\"authors\":[\"Jane Austen\"]}",
          "node.parent": [
            {
              "uid": "0x2711", <------------------------SAME NODE (no expanded node.owner)
              "node.hashid": "myin9aksq",
              "node.xdata": "{\"title\":\"The Da Vinci Code\",\"authors\":[\"Dan Brown\"]}",
              "node.parent|facet": "required"
            }
          ],
          "node.parent|facet": "recommended"
        },
        {
          "uid": "0x4e26",
          "node.owner": [
            {
              "uid": "0x4e21",
              "user.name": "johny"
            }
          ],
          "node.hashid": "35t8qc8i5",
          "node.xdata": "{\"title\":\"The Hobbit\",\"authors\":[\"J. R. R. Tolkien\"]}",
          "node.parent": [
            {
              "uid": "0x2711", <------------------------SAME NODE (no expanded node.owner)
              "node.hashid": "myin9aksq",
              "node.xdata": "{\"title\":\"The Da Vinci Code\",\"authors\":[\"Dan Brown\"]}",
              "node.parent|facet": "recommended"
            },
            {
              "uid": "0x4e25",
              "node.hashid": "r9t9rc4ip",
              "node.xdata": "{\"title\":\"Pride and Prejudice\",\"authors\":[\"Jane Austen\"]}",
              "node.parent|facet": [ <--------------- ISSUE #3582 (different issue)
                "required",
                "required"
              ]
            }
          ],
          "node.parent|facet": "recommended"
        }
      ]
    }
  ]
}

pjebs commented :

/cc @power-f-GOD

martinmr commented :

This is most likely because you reached the maximum depth allowed. Try playing with the depth parameter. I would start with 5 and see if that makes those nodes appear.

link to the docs: https://docs.dgraph.io/query-language/#recurse-query

I’ll also document what the default depth is since it’s currently unclear from the docs.

pjebs commented :

My use case requires no limits

pjebs commented :

If I was making a dependency tool like npm I can’t have a limit

martinmr commented :

Nevermind, I checked the code. The default depth when loop is set to false is math.MaxUint64 so that’s not the problem. The problem is you have set loop to false. Once a node has been seen, it won’t be expanded again. If that’s not the behavior you want, then you have to set loop to true and pass a depth parameter that is greater than zero.

If your use case requires no limits, then you must set loop to false, which will run the query until it reaches all the nodes but won’t follow loops. Otherwise your queries will never return.

pjebs commented :

Seems like a bug to me. The query implies the node should always be expanded, even if it means dgraph keeps the previously encountered node in memory and reuses it.

martinmr commented :

This is not a bug. You specifically set loop to false.

pjebs commented :

You are looking at it from an implementation point of view. That is not what the query implies irrespective of loop setting. @manishrjain

martinmr commented :

That is exactly what the query implies. If you want to follow loops you need to set loop to true and provide a depth parameter. Otherwise Dgraph would keep on expanding the graph endlessly.

pjebs commented :

My concern is regarding why the loop setting is effecting the JSon payload being returned. I understand loop should stop expanding endlessly. But the query’s “footprint” says for each node (uid) found, the JSon response should contain an expanded ‘node.owner’. the response should supply it, even if it means storing encountered nodes in memory and showing it.

pjebs commented :

The footprint of each returned object is:

uid
				node.owner
				user.name
				node.hashid
				node.xdata
				node.parent @facets @facets(eq

All returned objects should look like that as much as possible.

martinmr commented :

This is not different than a predicate not showing up when it doesn’t exist. In the first appearance of the node, node.owner was expanded because it had not been traverse that. After that, that edge won’t be expanded because loop is set to false. Dgraph doesn’t show anything. It could set node.owner to an empty object or list but that would not be very useful. You are right in that Dgraph will try to adhere to the query format requested but it won’t force it in situations where it does not make that much sense.

pjebs commented :

I’m suggesting it should be in the case in this scenario without having to “force it”. Since the uid has been encountered before, DGraph has all the data required to properly create an object in the eventual json which will look like the footprint suggested in the query.

It is essentially doing this: lemma-chain/find_chain.go at master · thehonestscoop/lemma-chain · GitHub (but the DGraph side rather than client).

It is not hard to do from a technical point of view, it doesn’t violate loop limit and returns what the query intuitively expects.

martinmr commented :

Doing something like this would in fact violate the loop guarantees. From the documentation:

The loop parameter can be set to false, in which case paths which lead to a loop would be ignored while traversing.

Including the objects at the end of an already traversed edge would mean we are considering paths that lead to a loop, which contradicts what setting loop to false means.

Something like this is application specific logic and should be done in the client, not in Dgraph.

pjebs commented :

Something is very wrong. Totally counter-intuitive of what the query footprint suggests without a good enough reason to justify not expanding a node that was already encountered. @manishrjain @campoy

NB: @martinmr I’m not saying you’re doing anything wrong. Maybe you implemented it as stated in docs or requirements. I think the behaviour should be considered wrong.

manishrjain commented :

Hey @pjebs ,

Thanks for bringing the issue. I read through the thread and honestly, I was a bit confused about the results as well. The semantics of what not to follow when loop: false is set are not very clear.

In terms of loop iteration, the general idea is to stop once the same node is encountered. However, the way it is implemented is such that we stop when the same edge is encountered. So, the same node can continue to be expanded, as long as we find new edges. Or, that’s my understanding so far.

I agree that we should maintain the query structure as closely as possible in the results. So, it rules out returning the same node.hashid multiple times, while not returning node.owner.

Based on my cursory look, I think the right thing to do here would be to stop expanding edges as soon as the node is seen again. But, I feel I’m missing some history here.

We’ll look closer and see what changes are required here to make the semantics of loop clearer.

CC: @pawanrawal

pawanrawal commented :

This is a bug. The response of the query should be the same as the expanded query below but its not.

Expanded query

{
  me(func: uid(0x4e27)) {
    uid
    node.hashid
    node.xdata
    node.parent {
      uid
      node.hashid
      node.xdata
      node.owner {
        uid
        user.name
      }
      node.parent {
        uid
        node.hashid
        node.xdata
        node.owner {
          uid
          user.name
        }
        node.parent {
          uid
          node.hashid
          node.xdata
          node.owner {
            uid
            user.name
          }
        }
      }
    }
    node.owner {
      uid
      user.name
    }
  }
}

The JSON data-set that can be used to reproduce this issue is pasted below.

Dataset

{
  "set": [
    {
      "uid": "0x4e27",
      "node.owner": [
        {
          "uid": "0x4e21",
          "user.name": "johny"
        }
      ],
      "node.hashid": "jjtn4cdirv",
      "node.xdata": "{\"title\":\"The Two Towers\",\"authors\":[\"J. R. R. Tolkien\"]}",
      "node.parent": [
        {
          "uid": "0x2711",
          "node.owner": [
            {
              "uid": "0x1",
              "user.name": "powerofgod"
            }
          ],
          "node.hashid": "myin9aksq",    
          "node.xdata": "{\"title\":\"The Da Vinci Code\",\"authors\":[\"Dan Brown\"]}",
          "node.parent|facet": "required"
        },
        {
          "uid": "0x4e25",
          "node.owner": [
            {
              "uid": "0x1",
              "user.name": "powerofgod"
            }
          ],
          "node.hashid": "r9t9rc4ip",
          "node.xdata": "{\"title\":\"Pride and Prejudice\",\"authors\":[\"Jane Austen\"]}",
          "node.parent": [
            {
              "uid": "0x2711",
              "node.hashid": "myin9aksq",
              "node.xdata": "{\"title\":\"The Da Vinci Code\",\"authors\":[\"Dan Brown\"]}",
              "node.parent|facet": "required"
            }
          ],
          "node.parent|facet": "recommended"
        },
        {
          "uid": "0x4e26",
          "node.owner": [
            {
              "uid": "0x4e21",
              "user.name": "johny"
            }
          ],
          "node.hashid": "35t8qc8i5",
          "node.xdata": "{\"title\":\"The Hobbit\",\"authors\":[\"J. R. R. Tolkien\"]}",
          "node.parent": [
            {
              "uid": "0x2711",
              "node.hashid": "myin9aksq",
              "node.xdata": "{\"title\":\"The Da Vinci Code\",\"authors\":[\"Dan Brown\"]}",
              "node.parent|facet": "recommended"
            },
            {
              "uid": "0x4e25",
              "node.hashid": "r9t9rc4ip",
              "node.xdata": "{\"title\":\"Pride and Prejudice\",\"authors\":[\"Jane Austen\"]}"
            }
          ],
          "node.parent|facet": "recommended"
        }
      ]
    }
  ]
}

It happens because while considering loops, we filter the edges that we have already walked but this filtering happens at the subgraph level and not for individual sub-trees. The response should include the expanded nodes as mentioned by @pjebs as they won’t lead to any loop for the given data-set.

We should ignore the nodes that we have walked but do it for each branch of the tree separately. I’ll look into how to do that.

pawanrawal commented :

In the meantime, there is a workaround. You can get the correct results you want by setting loop to true and specifying a very large depth (the max that you expect).