Problem about node filter according to another node with distance of 2

With the schema like:

type person {
    locatedIn: [city]
    name
    friend: [person]
}
type city {
    locatedIn: [country]    
    name
}
type country {
    name
}
name: string .
locatedIn: [uid] @reverse.
friend: [uid] @reverse.

How do I filter out friends of a person who do not live in a country?
If uid_in could receive a list of uid or query varibles, I would like to do like this:

var(func: eq(name, 'America')) {
    cities as ~locatedIn { }
}
query(func: uid(*person_uid*)) {
    friend @filter(uid_in(locatedIn, cities)) { #Or val(cities)
        name
    }
}

Even with the way mentioned in your reply in Oct 1, it still does not support a list of uid.
Also, even this problem can be achieved by uid_in, it is still too wastefull since it is not need to compare all the cities with the out edge of friend.
What should I do to filter out friends of a person who do not live in a country?

Maybe you could respecify your var block to start from the person_uid root with the @cascade directive at the top that filters the nested city to country connections to only ‘America’?

Forgive me if I’m incorrect since I’ve only been perusing the query documentation for a couple of weeks, but it might look like:

var(func: uid(*person_uid*)) @cascade {
   filteredFriends as friend {
      locatedIn {
         country @filter(eq(name, 'America')) {
            name
         }
      }
   }
}

query(func: uid(*person_uid*)) {
   friend @filter(uid(filteredFriends)) {
      name
   }
}

Or I could be totally wrong. Edit: I also have no idea what was proposed in the October 1 reply you’ve referred to, and I couldn’t trace it back through your user profile =(.

The query for this would be like

{
  var(func: eq(name, "United States")) {
    ~locatedIn {  isHere as ~locatedIn { name } }
  }

  query(func: eq(name, "Bob")) {
      friendNot_In_US : friend @filter(NOT uid(isHere)) {
          name
          locatedIn { name }
      }
  }
}

The response

{
  "data": {
    "query": [
      {
        "friendNot_In_US": [
          {
            "name": [
              "Lilian"
            ],
            "locatedIn": [
              {
                "name": [
                  "Quebec"
                ]
              }
            ]
          }
        ]
      }
    ]
  }
}

But there’s a problem here. If you query like this. You gonna receive internally all Persons who live in the US, it could be millions - for small context is okay tho. In general, this isn’t performative. In this case, UID_IN would be very handy. But we need support Add support of Value Variables for uid_in. · Issue #3066 · dgraph-io/dgraph · GitHub for variables.

The sample created to test this

{
   "set":[
      {
         "uid":"_:Bob",
         "dgraph.type":"Person",
         "name":"Bob",
         "telephone":"(425) 123-4567",
         "friend":[
            {
               "uid":"_:Julian"
            },
            {
               "uid":"_:Lilian"
            }
         ],
         "locatedIn":[
            {
               "uid":"_:SF"
            }
         ]
      },
      {
         "uid":"_:Julian",
         "dgraph.type":"Person",
         "name":"Julian",
         "telephone":"(425) 322-0551",
         "friend":[
            {
               "uid":"_:Bob"
            },
            {
               "uid":"_:Lilian"
            }
         ],
         "locatedIn":[
            {
               "uid":"_:SF"
            }
         ]
      },
      {
         "uid":"_:Lilian",
         "dgraph.type":"Person",
         "name":"Lilian",
         "telephone":"(425) 322-0551",
         "friend":[
            {
               "uid":"_:Bob"
            },
            {
               "uid":"_:Julian"
            }
         ],
         "locatedIn":[
            {
               "uid":"_:Quebec"
            }
         ]
      },
      {
         "uid":"_:US",
         "dgraph.type":"Country",
         "name":"United States"
      },
      {
         "uid":"_:CAD",
         "dgraph.type":"Country",
         "name":"Canada"
      },
      {
         "uid":"_:Quebec",
         "dgraph.type":"City",
         "name":"Quebec",
         "locatedIn":[
            {
               "uid":"_:CAD"
            }
         ]
      },
      {
         "uid":"_:SF",
         "dgraph.type":"City",
         "name":"San Francisco",
         "locatedIn":[
            {
               "uid":"_:US"
            }
         ]
      }
   ]
}

humm, after read that I noticed that this topic is a copy-paste of Add support of Value Variables for uid_in. · Issue #3066 · dgraph-io/dgraph · GitHub

He was mentioning my comment in October 1st.

Thanks for the reference! Would also appreciate any edification you could provide since I’m still learning all this stuff. Based on your comment about the performative nature of the query, it got me thinking about even UID_IN’s limitations (if/when it is supported for variables). It seems like it’s appropriate to consider the surface area of the filter vector.

Using this city example and being agnostic to the underlying data set, there are potentially millions of city names in the world (“city” being used loosely for any defined locality), with the US having 20k. This could mean a UID_IN having to filter based on an array of many thousands or even millions of UIDs. Since an average person has fewer than 100 friends, and those friends have a one-to-one relationship to city (which has a one-to-one relationship with country), I wonder what the performance implications would be of going Country-first in the var block with a potentially massive list of uids for UID_IN as opposed to the more narrow (but longer and slightly redundant) vector of User-to-Friend-first with a cascade.

Would be interested in your thoughts since there are often multiple directions from which graph problems can be solved.

In fact, going back to review my query against the context of the Dataset. It’s even worse than I thought if it were used in a real world dataset. Potentially in my query it would get absolutely ALL inhabitants of the united states. That would be catastrophic, as it would be over 300 million to find half a dozen. Totally at wrong-way.

This could be okay in small cases. Not like that one. The case with UID_IN can be used tho. I’ll show you how.

Nope, the uid_in val would take only the USA uid. The intention of UID_IN is just a single query level. Instead of writing one more nested block, you simply use one parameter of it one level earlier. And yeah, It could be a array of uids in other cases.

I have reviewed your query, indeed this approach is better.

{ 
  me as var(func: eq(name, "Bob")) @cascade {
   BobsFriends_In_EUA as friend {
    locatedIn { 
      locatedIn @filter(eq(name, "United States")) 
      }
    }
  }

  query(func: uid(me)) {
    name
    friend @filter(Not uid(BobsFriends_In_EUA)) {
      name
      locatedIn { name }
   }
  } 
}

This would list all friends who are not in USA.

But we can go even futher, as cascade now is supported in subqueries.

{ 
  query(func: eq(name, "Bob")) {
    name
    friendNot_In_US : friend @cascade {
      name
      locatedIn {
        name 
        locatedIn @filter(NOT eq(name, "United States"))
      }
   }
  }
} 

Technically, this way we can apply filters at indefinite levels without using multiple blocks. The cascade in subquery is the key to write less.

Using uid_in can make it easier to “see” and understand for newbies in Graphql+-. This same query I made earlier would look like this.

{
  US as var(func: eq(name, "United States"))
  
  query(func: eq(name, "Bob")) {
    name
    friendNot_In_US : friend @cascade {
      name
      locatedIn @filter(NOT uid_in(locatedIn, val(US))) {
        name 
      }
   }
  }
} 

Is basically the same thing. But vissually more “friendly”.

Without subquery cascade it would look like:

{
  US as var(func: eq(name, "United States"))
  
  Target as var(func: eq(name, "Bob")) {
    friends as friend
 }

  #block filter
  F as var(func: uid(friends))@cascade {
    name
    locatedIn { #OR => @filter(NOT uid_in(locatedIn, val(US)))
      locatedIn @filter(NOT uid(US))
      }
    }
  
  query(func: uid(Target)) {
    name
    friendNot_In_US : friend @filter( uid(F))  {
      name
      locatedIn  {
       from : name
      }
   }
  }
}

As you can see, before subquery cascade life would be bad. Now with Sub cascade we can use just two blocks or one for a more “cascade” advanced users.

Yeah, I thought I had read a version of the docs where @cascade could only be applied at root (which I recall being kind of bummed about when reading it), so that update for subquery cascade is very welcome. But… what would be even more awesome is a parameterized version of @cascade because there are many instances where it would be beneficial to exclude any results with missing predicate X but to not exclude the results for which predicate X is present and predicate Y is not (i.e. something like @cascade(pred1, pred2...).

I might be missing something, or it’s due to the re-use of locatedIn for both the Person and City objects (would probably recommend livesIn and locatedIn, respectively, for clarity and also sharding). But isn’t it only true that you would need just the one uid for UID_IN if the query were going to pull in the locatedIn predicate? If you don’t need the location data itself, you’re still stuck in the position of having to over-fetch (i.e. include the friend’s locatedIn when you don’t “need” that data) or having to pass in a potentially large list of city uid’s to filter by the friend’s locatedIn using UID_IN from the friend level.

Hum, “has” doesn’t fit that requirement?

Not has(pred1, pred2...) or has(pred1, pred2...)

Cascade is more like “it’s 100% matching, get rid of the rest”.

Yes, this naming can be a little confusing for those who are not very focused.

Not exactly. has() would be great, but it’s a check of whether a given predicate exists or not, right? I think this use case is specific to filtered values. My understanding was that has(pred1) would still allow for a record with an empty set to be generated for pred1 when a filter is applied, or is that incorrect?

I think an access control example might help to illustrate this, but you could conceive of others.

q(func: ...) {
   someProtectedAsset {   # if we use @cascade here, it will trigger if any of 
      p1                  # p1-p6 are empty. Has(accessGroup) would 
      p2                  # be true for all someProtectedAssets, but
      p3 {                # we only want results that have non-null
         p4, p5, p6       # permissions with something like @cascade(accessGroup) 
      }
      accessGroup @filter(UID_IN(groupMembers, user_uid)) {
          permissions
      }
   }
}

Does that illustrate why parameterized @cascade would be different from has and pretty darn useful at a subquery level? If I’m misunderstanding how has is implemented or the idea that filters can lead to empty result sets, I might be off the mark.

I’m lost a bit with the direction of the conversation. If you wanna discuss about the cascade design let’s open another topic for this. Move all that there and we can go deeper on it. I think it is all covered for the @lych4o question.

But, on your question, ‘Has’ could fit into this context you commented, I think. @cascade does more things under the hood than has func would do. So it’s not a directly 1:1 comparison with cascade and has.

Cheers.

Thank you! I think the way using cascade could solve my problem perfectly!
I’m sorry to directly copy reply in inssue to here without modification. I’ll pay attention next time.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.