Intersect version of uid(…)

I want to get the intersection of a set of uid vars. This is similar to Get output of uid(a,b,...) as intersection not union, but I did not want to reopen that lengthy discussion.

I can turn uid(…), which does the union of its vars, into an intersection with a @filter as follows:

pred1 as var(func: has(<dgraph.graphql.schema>))
pred2 as var(func: has(<dgraph.graphql.xid>))
pred3 as var(func: has(<dgraph.type>))

result (func: uid(pred1,pred2,pred3)) @filter(has(<dgraph.graphql.schema>) AND has(<dgraph.graphql.xid>) AND has(<dgraph.type>)) {
  uid
  <dgraph.graphql.schema>
  <dgraph.graphql.xid>
  <dgraph.type>
}

The uid vars pred1, pred2 and pred3 provide the set of uids already. Applying a @filter and repeating the has operations seems redundant and might not be as performant in its implementation as a uid_intersect(pred1,pred2,pred3) might be:

pred1 as var(func: has(<dgraph.graphql.schema>))
pred2 as var(func: has(<dgraph.graphql.xid>))
pred3 as var(func: has(<dgraph.type>))

result (func: uid_intersect(pred1,pred2,pred3)) {
  uid
  <dgraph.graphql.schema>
  <dgraph.graphql.xid>
  <dgraph.type>
}

And uid_intersect(…) makes this query much more handy and readable. Imagine more vars here.

Any thoughts on that?

Have you tried cascade? I think (from what I got in your question) that it might work for you.

pred1 as var(func: has(<dgraph.graphql.schema>))
pred2 as var(func: has(<dgraph.graphql.xid>))
pred3 as var(func: has(<dgraph.type>))

result (func: uid(pred1,pred2,pred3)) @cascade {
  uid
  <dgraph.graphql.schema>
  <dgraph.graphql.xid>
  <dgraph.type>
}

I have a use case for this:

I want to do filtering at different levels. For instance I need to answer questions like, “Show me contacts (that have an address that have cities that are in X, Y, Z) and (has events that are in the seven days OR has tasks that have occurrences that are not completed and due in the next seven days.)” In order to fulfill similar request, I have to run the filtering logic at a higher level in my UI and get all of the UIDS that fulfill the separate parts and then do the conjunction logic and then pass these UIDS to the filter where I actually get the full graph to use with pagination. Right now my UI has to do much of this logic and it would be nice to unload at least the last part to the database layer.

The cascade is getting better, especially with parameterized cascade coming soon to the main releases. But the cascade cannot support nested logic like OR and only covers some AND logic.

Yes, @cascade would work for that example, but not for this:

pred1 as var(func: has(<dgraph.graphql.schema>))
pred2 as var(func: has(<dgraph.graphql.xid>))
pred3 as var(func: has(<dgraph.type>))

result (func: uid(pred1,pred2,pred3)) @cascade {
  uid
}

And I need the intersection no matter what is in the body of result.

1 Like

@MichelDiz so do you agree that uid_intersect would be a concise addition to the GraphQL± language?

Sorry for the late reply. You have marked me with the wrong nick.

Well, I think the cascade work well. I can’t see how that would work well different from the cascade. See, if the query doesn’t have anything to compare, it would have any arbitrary comparison and maybe it wouldn’t be expected result as it would be with explicitly “params”.

Let’s take your last query into account. What would be the rules? The node with more predicates? So, all the other nodes would be parameterized by it? Or it would infer all possible nodes and gather all possible predicates? but in that case, maybe you would never have any result.

For me, the cascade is the best solution for now.

Cheers.

Sorry, I am confused. Let’s look at this query again:

pred1 as var(func: has(<dgraph.graphql.schema>))
pred2 as var(func: has(<dgraph.graphql.xid>))
pred3 as var(func: has(<dgraph.type>))

result (func: uid(pred1,pred2,pred3)) @filter(has(<dgraph.graphql.schema>) AND has(<dgraph.graphql.xid>) AND has(<dgraph.type>)) {
  uid
}

I read this query as:

  • find the set of uids that have <dgraph.graphql.schema>, memorize them as pred1
  • find the set of uids that have <dgraph.graphql.xid>, memorize them as pred2
  • find the set of uids that have <dgraph.type>, memorize them as pred3
  • for all uids in pred1 union pred2 union pred3 that have all three predicates, return the uid

This cannot be simplified with @cascade.

The uid in conjunction with the @filter(has(…) AND has(…) AND has(…)) implements the intersection of the uid sets pred1, pred2 and pred3, but a shorter notation would be great:

pred1 as var(func: has(<dgraph.graphql.schema>))
pred2 as var(func: has(<dgraph.graphql.xid>))
pred3 as var(func: has(<dgraph.type>))

result (func: uid_intersect(pred1,pred2,pred3)) {
  uid
}

Which I read as:

  • find the set of uids that have <dgraph.graphql.schema>, memorize them as pred1
  • find the set of uids that have <dgraph.graphql.xid>, memorize them as pred2
  • find the set of uids that have <dgraph.type>, memorize them as pred3
  • for all uids in pred1 intersect pred2 intersect pred3, return the uid

I am not sure what you refer to with “What would be the rules?”. And I think @cascade is not applicable in this more general query. Please provide some more explanation.

Thanks,
Enrico

I know that smaller queries doing magic is great. But there are other concerns about this. A big one is that the team is focussed right now on fixing bugs and GraphQL specs. Nothing beyond this will be worked in the mid term.

Another point is about how the Query system works. The variables won’t get the predicate used on has func. As it is just a map of uids. Unless it is a value variable. Which you get by expanding the body of the has func e.g:

pred1 as var(func: has(<dgraph.graphql.schema>)) {
   realPred1 as <dgraph.graphql.schema>
}

This intersect func needs a lot of contexts to make it work as you want. It needs to infer each block with a complex contextualization that would come in the map - I agree that making things easier write is good, but right now in my opinion cascade does the work.

The rules that need to be applied - in this case, you answered that it would be the predicate used as a parameter in has func. They are not implicit in the variable. The predicates used as parameters are not embedded in the variable. It might be necessary to create a new type of variable for this. Because making it more contextualized will lead to more use of memory unnecessarily for other queries.

It is necessary to discuss the pros and cons when going to simplification.

You said before that it works. But it isn’t desirable cuz you have to write more lines on the query. That is a small cost instead of wait for support for this.

Anyway, feel free to open a request at https://discuss.dgraph.io/c/issues/dgraph/38 - My comments aren’t a block. Just giving you suggestions and discussing about the topic.

Cheers.

I understand that depending on how the query language is implemented a simple change might be easy and straight forward or a heavy refactoring. This seems to be the latter, and it adds only little expressiveness to the language. Thanks for the insights.

1 Like

@EnricoMi, off-topic, but one thing for sure with these special predicates is that, there will only be one node containing both these predicates. So, you should be getting only one uid in your result for the given query at present.