Cascade directive with inconsistent behavior using Var Block

MichelDiz · October 20, 2020, 11:12pm

Report a Dgraph Bug

Reference: DQL: Am I using vars correctly?

What version of Dgraph are you using?

v20.07.1

Have you tried reproducing the issue with the latest release?

Yes

What is the hardware spec (RAM, OS)?

N/A

Steps to reproduce the issue (command/config used to run Dgraph).

Run queries:

{
  caro(func: has(director.film)) @filter(not uid(0x9bbae)) @cascade {
    A as  uid
    name@en
    director.film @filter(uid_in(~director.film, 0x9bbae)) 
  }
  results(func: uid(A)) {
   count(uid)
  }
}

This query above returns 95042 results. But it should return only 1.

Expected behaviour and actual result.

The docs states:

With the @cascade directive, nodes that don’t have all predicates specified in the query are removed. This can be useful in cases where some filter was applied or if nodes might not have all listed predicates.

The inconsistent behavior is that the position of the variable shouldn’t modify the result. Unless it is an undocumented behavior.

{
  A as caro(func: has(director.film)) @filter(not uid(0x9bbae)) @cascade {
    name@en
    director.film @filter(uid_in(~director.film, 0x9bbae)) { uid ~director.film { name@en }}
  }
  results(func: uid(A)) {
    uid
  }
}

MichelDiz · October 22, 2020, 2:32am

@LGalatin I have tested this query bellow in this version

Dgraph version   : v1.0.11
Commit SHA-1     : b2a09c5b
Commit timestamp : 2018-12-17 09:50:56 -0800
Branch           : HEAD
Go version       : go1.11.1

And I still get the same issue. I think this is a design thing from the query system. Maybe we should map it. Mapping the way how the query system works (Not talking about a query planner) would help even new engineers to solve issues rapidly. When I say a map, I mean a deep map. Explaining why it is as it is. With flows and dependencies.

PS. Maybe this is just a lack of information about the query system. Maybe isn’t a bug at all.

I have a “mental map” of how it works, but it just theoretical from my own experience, not actual deep knowledge. I see several issues with multiblock approach that we could solve by making it clear for others.

Also, we should have a “map first” policy instead of just jump into modifying and push features. We should create RFCs with the “how it is done or how it is implemented” first. And attach it to do the main one. The RFCs already does something like that, but we need a whole Design Map of the query system. Or we gonna have legacy things out of our sight and take time to get an engineer up and running.

The query

{
  q(func: eq(name@en, "Jean-Pierre Jeunet")){ #Use this to find Pierre, cuz his UID can be changed
    uid
    name@en
  }
  A as caro(func: has(director.film)) @filter(not uid(0x3bd3a6)) @cascade {
    name@en
    director.film @filter(uid_in(~director.film, 0x3bd3a61)) { uid }
  }
  results(func: uid(A)) {
    count(uid)
  }
  caro2(func: has(director.film)) @filter(not uid(0x3bd3a61)) @cascade {
    A2 as  uid
    name@en
    director.film @filter(uid_in(~director.film, 0x3bd3a61)) 
  }
  results2(func: uid(A2)) {
   count(uid)
  }
}

amaster507 · October 22, 2020, 4:27am

I understand this query processing map with cascade is what leads to the problem with cascade and pagination as well. Cascade is always an after the fact process right before the return to the parent. As powerful and helpful as cascade is, it should not be the first tool grabbed to fix a query. Cascade still requires the same number of nodes to be touched.

This bug does prove interesting and provides insights to how vars work within blocks. We see here that the var A2 is assigned the value and retains the value after the cascade process which makes sense as the cascade is not removing items from the variable, but nodes from the root/edge.

MichelDiz · October 22, 2020, 4:53am

Yeah, that was my suspicion when I saw the user step in that. But we lack precise docs about it. To state if that is a desirable behavior or not. Cuz if you take into account the docs about cascade. It states that it will remove from the query any node who doesn’t have all predicates specified. Not that it would rely on the root level only.

So, the user would assume that any variable in the body of the query would be “filtered”. Right? Some users use this directive to do complex filtering - some really deep traversing ones. If we care only about the root and don’t tell the user that we won’t apply any rule at the query body. This will confuse the user.

Suppose: What happens if the user wants only value variables from queries that have all listed predicates? He might end up getting all unwanted values. As those come from the query body.

Well, I didn’t get to test whether this same problem happens with value variables. But seeing this behavior, I presume it has a high probability to be true.

Topic		Replies	Views
UID variables, cascade, and pagination with first Issues status:accepted , kind:bug , ticket:created	2	1120	January 11, 2022
Bug in cascade directive? Dgraph	2	375	January 28, 2020
Variable not filtered by @cascade Dgraph kind:question	5	428	May 11, 2021
Behavior of cascade not clear with aggregation Dgraph dgraph , priority:p2 , status:accepted , kind:bug , area:querylang	3	777	November 16, 2020
Filtering UIDs in DQL Upsert Query Issues kind:question	1	479	April 2, 2021