Finding the intersection of many edges from node A within node B

Jesus_Soto · March 23, 2021, 9:40pm

What I want to do

Find the intersection within an array A and an array B of edges.

Let’s suppose that we have 2 movie genres: Thriller and Comedy. These genres are unique and are themselves nodes, so the movie will have edges to these nodes depending on the number of genres that it has.

I want to find only the movies that contain these genres exclusively as follows:

movies(func: eq(dgraph.type, "Movies")) @cascade{
  id
  genre @filter( eq(name, "Thriller") AND eq(name, "Comedy") )
}

This returns an empty result as consequence, which is different from the OR operator. But this makes sense because a name is a unique predicate and it does not behave as a list. You can have a name or the other one, but you can’t have both.

So my question is, how can I find the intersection of these 2 edges results having this previous statement as a fact without consuming too many resources?. Below you’ll see my solution and will notice that it might become a very expensive operation if the transaction is made with too many elements at the same time.

What I did

{ 
  movies1 as var(func: eq(dgraph.type, "Movies")) @cascade {
     genre @filter(eq(name, "Thriller"))
  }
    
  movies2 as var(func: uid(movies1)) @cascade{
    genre @filter(eq(name, "Comedy"))
  }

  movies3 as var(func: uid(movies2)) @cascade{
    genre @filter(eq(name, "Drama"))
  }

  result(func: uid(movies3)) {
    id
  }
}

I hope that my explanation is clear, above is a solution to the problem, but I would love to get help to make this more efficient.

I appreciate a lot your contribution in advance.

Have a nice evening.

Dgraph metadata

dgraph version

v20.11.1

verneleem · March 23, 2021, 10:20pm

This query works with the schema and data on play.dgraph.io, there are 470 movies in that dataset that match this query

{
  var(func:eq(name@en,"Thriller")) {
    movies1 as ~genre
  }
  var(func:eq(name@en,"Comedy")) {
    movies2 as ~genre @filter(uid(movies1))
  }
  var(func:eq(name@en,"Drama")) {
    movies3 as ~genre @filter(uid(movies2))
  }
  result(func:uid(movies3)) @filter(type(Film)) {
    uid
    name@en
  }
}

Improvements made to query

Does not use cascade to post process results
Uses reverse edges to start with the smaller known root variable blocks
Builds each block upon the first
Final results puts uid filter in the root function and type filter as extra filter.

Jesus_Soto · March 23, 2021, 10:33pm

Hi @verneleem, thanks a lot for your reply. This seems very helpful since it achieves to improve resources. However, do you think this is the only way to find the intersection of many elements?

Imagine I have 100k elements and I want to find the intersection within all of them, for instance, if I am using Node, I would have to build a string that contains this whole transaction, at least maybe by my side I will be able to process a huge iteration, but, is it an enough efficient implementation in dgraph?

Topic		Replies	Views
Detecting full loop paths Dgraph	1	464	May 6, 2020
How to find all predicates that occur between two specific nodes Dgraph	1	638	April 26, 2019
Query in tour of dgraph Users example	3	620	May 26, 2020
Find nodes with edges to nodes filtered by type Dgraph	5	645	February 12, 2020
Querying for nodes connected to a particular node Dgraph	3	1040	January 7, 2020

Finding the intersection of many edges from node A within node B

What I want to do

What I did

Dgraph metadata

Related topics