Query to detect a missing edge

Hello, I’m trying to write a specific query, and just cannot work out how to achieve this one. Any help appreciated!

For every Edge1 in my database, this exact pattern should exist. However in some cases Edge3 can be missing, so I’m trying to find all Node3-Node4 pairs where Edge3 is missing (or even better, add the link in directly).

Ideally I need to iterate through all Edge1’s in a single query, as there are many millions of them and I do not want to issue a query for each edge.

This is the best query I managed to come up with:

upsert {
  query {
    A as var(func:has(Node1.Edge1)) {~Node4.Edge4 {B as uid}}
	
	var(func:uid(A)) {
	  Node1.Edge1 {
		~Node3.Edge2 {
          C as uid
          ~Node4.Edge3 @filter(uid(B))
           {D as uid}
		}
	  }
    }
  }

  mutation @if(not eq(len(D), 1)) {
    set {
      uid(B) <Node4.Edge3> uid(C) .
    }
  }
}

However this has the issue that is will cartesian all B’s and C’s together from all results, it is not limited to just this one loop/pattern.

I also tried standard queries to just return the B/C pairs with missing Edge3’s, but all had this cartesian problem whatever I tried.

Hi, the problem looked interesting, so I decided to have some fun with it! Did you manage to solve it?

Ideally I need to iterate through all Edge1’s in a single query (& add the link in directly)

I don’t think it’s possible using upsert, as there isn’t really any loops in queries, (and for millions of nodes at once, server resource usage may be an issue). I’d solve problems like this through multiple upserts.

upsert {
  query {
    d(func:has(Node1.Edge1)) {~Node4.Edge4 {D as uid}}
    d1(func:uid(D), first: 1) @filter(NOT has(Node4.Edge3)) { D1 as uid}
    c(func:uid(D1)) { Node4.Edge4 { Node1.Edge1 { ~Node3.Edge2 { C as uid }}}}
    c2(func:uid(C)) @filter(NOT has(~Node4.Edge3)) {C2 as uid}
  }
  mutation @if(ge(len(C2), 1) AND eq(len(D1), 1)) {
    set {
      uid(D1) <Node4.Edge3> uid(C2) .
    }
  }
}

Probably can be improved through codegen, by batching multiple queries & mutation blocks at once, and playing with first: 2, offset: 1 etc.

Ideally I need to iterate through all Edge1’s in a single query

However it may be possible to just do a query. I modified the query above and used uid_in to do some filtering. Not sure it will work on your real data.

  query {
    var(func:has(Node1.Edge1)) {~Node4.Edge4 {D as uid}}
    var(func:uid(D), first: 1000) @filter(NOT has(Node4.Edge3)) { D1 as uid}
    c(func:uid(D1)) { uid Node4.Edge4 { Node1.Edge1 { ~Node3.Edge2 @filter(NOT uid_in(~Node4.Edge3, uid(D1))) { uid }}}}
  }
1 Like

Thanks for thinking along with me @ppp225 , I think I have a working solution now so leaving it here for others to find.

I did indeed go down the route of making it a query, and doing the mutation for adding the missing edges as a separate query. The problem with both our query structures is that these “squares” in my diagram can overlap a lot, and the variables are not scoped to the starting uid, e.g. Node1-1 could incorrectly pass the query by ending on Node4-2 from a different starting edge.

This could be solved by dgraph adding a for loop or scoping to the variables. The only way i could get this to work was to issue each as a separate query, each with their own variables (A1 B1 C1 C1, A2 B2, C2, D2 etc etc).

This was however (unsuprisingly) slow due to query overhead (40min for 100k edges). With some optimisation I was able to get this down to around 30s with the following pattern:

{
var(func:has(Node1.edge1), first:<batchsize>, offset:<high water mark>) {A as uid}

# repeat this query section <batchsize> times, each loop is <n>
var(func:uid(A), first:1, offset:<n>) {A<n> as uid ~Node4.edge4 {B<n> as uid}}
q<n>(func:uid(A<n>))@normalize{
  ~Node4.edge4 {srcRE:uid}
  Node1.edge1 {
    ~Node3.edge2 {
      dstRE:uid
      ~Node4.edge3 @filter(uid(B<n>)) {found:uid}
    }
  }
}
# end loop
}

This works nicely and performs well. I think its a nice example however of how a loop syntax could make this much easier (and likely faster again).