Query to detect a missing edge

RJKeevil · February 4, 2025, 11:12am

Hello, I’m trying to write a specific query, and just cannot work out how to achieve this one. Any help appreciated!

For every Edge1 in my database, this exact pattern should exist. However in some cases Edge3 can be missing, so I’m trying to find all Node3-Node4 pairs where Edge3 is missing (or even better, add the link in directly).

Ideally I need to iterate through all Edge1’s in a single query, as there are many millions of them and I do not want to issue a query for each edge.

This is the best query I managed to come up with:

upsert {
  query {
    A as var(func:has(Node1.Edge1)) {~Node4.Edge4 {B as uid}}
	
	var(func:uid(A)) {
	  Node1.Edge1 {
		~Node3.Edge2 {
          C as uid
          ~Node4.Edge3 @filter(uid(B))
           {D as uid}
		}
	  }
    }
  }

  mutation @if(not eq(len(D), 1)) {
    set {
      uid(B) <Node4.Edge3> uid(C) .
    }
  }
}

However this has the issue that is will cartesian all B’s and C’s together from all results, it is not limited to just this one loop/pattern.

I also tried standard queries to just return the B/C pairs with missing Edge3’s, but all had this cartesian problem whatever I tried.

ppp225 · February 6, 2025, 11:51pm

Hi, the problem looked interesting, so I decided to have some fun with it! Did you manage to solve it?

Ideally I need to iterate through all Edge1’s in a single query (& add the link in directly)

I don’t think it’s possible using upsert, as there isn’t really any loops in queries, (and for millions of nodes at once, server resource usage may be an issue). I’d solve problems like this through multiple upserts.

upsert {
  query {
    d(func:has(Node1.Edge1)) {~Node4.Edge4 {D as uid}}
    d1(func:uid(D), first: 1) @filter(NOT has(Node4.Edge3)) { D1 as uid}
    c(func:uid(D1)) { Node4.Edge4 { Node1.Edge1 { ~Node3.Edge2 { C as uid }}}}
    c2(func:uid(C)) @filter(NOT has(~Node4.Edge3)) {C2 as uid}
  }
  mutation @if(ge(len(C2), 1) AND eq(len(D1), 1)) {
    set {
      uid(D1) <Node4.Edge3> uid(C2) .
    }
  }
}

Probably can be improved through codegen, by batching multiple queries & mutation blocks at once, and playing with first: 2, offset: 1 etc.

Ideally I need to iterate through all Edge1’s in a single query

However it may be possible to just do a query. I modified the query above and used uid_in to do some filtering. Not sure it will work on your real data.

  query {
    var(func:has(Node1.Edge1)) {~Node4.Edge4 {D as uid}}
    var(func:uid(D), first: 1000) @filter(NOT has(Node4.Edge3)) { D1 as uid}
    c(func:uid(D1)) { uid Node4.Edge4 { Node1.Edge1 { ~Node3.Edge2 @filter(NOT uid_in(~Node4.Edge3, uid(D1))) { uid }}}}
  }

RJKeevil · February 9, 2025, 3:03pm

Thanks for thinking along with me @ppp225 , I think I have a working solution now so leaving it here for others to find.

I did indeed go down the route of making it a query, and doing the mutation for adding the missing edges as a separate query. The problem with both our query structures is that these “squares” in my diagram can overlap a lot, and the variables are not scoped to the starting uid, e.g. Node1-1 could incorrectly pass the query by ending on Node4-2 from a different starting edge.

This could be solved by dgraph adding a for loop or scoping to the variables. The only way i could get this to work was to issue each as a separate query, each with their own variables (A1 B1 C1 C1, A2 B2, C2, D2 etc etc).

This was however (unsuprisingly) slow due to query overhead (40min for 100k edges). With some optimisation I was able to get this down to around 30s with the following pattern:

{
var(func:has(Node1.edge1), first:<batchsize>, offset:<high water mark>) {A as uid}

# repeat this query section <batchsize> times, each loop is <n>
var(func:uid(A), first:1, offset:<n>) {A<n> as uid ~Node4.edge4 {B<n> as uid}}
q<n>(func:uid(A<n>))@normalize{
  ~Node4.edge4 {srcRE:uid}
  Node1.edge1 {
    ~Node3.edge2 {
      dstRE:uid
      ~Node4.edge3 @filter(uid(B<n>)) {found:uid}
    }
  }
}
# end loop
}

This works nicely and performs well. I think its a nice example however of how a loop syntax could make this much easier (and likely faster again).

Topic		Replies	Views
Query to delete duplicates Users mutation	5	744	April 1, 2020
Best way to check if an edge exists between two nodes Dgraph example	3	1872	April 30, 2020
Dgraph query Issues	2	391	January 6, 2022
Detecting full loop paths Dgraph	1	464	May 6, 2020
Find nodes with edges to nodes filtered by type Dgraph	5	645	February 12, 2020

Query to detect a missing edge

Related topics