How to delete duplicate node

Hi @Mickey248,
It looks like we want to merge the “C” Nodes. I am trying to model and solve this as a “Data Merging” problem entirely using DQL, so that the solution is not dependent on any programming language.

Step 1
Let’s create a similar structure. If we are aware of duplicates, we need to add corresponding tags, so that this information is expressed in data and can be so queried.

{
  set{
    _:a <ilink> _:c1 .
    _:a <name> "A" .
    _:b <ilink> _:c2 .
    _:b <name> "B" .
    
    _:c1 <olink>  _:g .
    _:c1 <olink>  _:f .    
    _:c2 <olink>  _:f .
    _:g <name> "G" .
    _:f <name> "F" .
    
    
    _:c1 <name> "C1" .    
    _:c1 <tag> "duplicate" .
    _:c2 <name> "C2" .
    _:c2 <tag> "duplicate" .
  }
}

At this point, the graph around C node looks as below, quite similar to what you have.

Step 2
As a first mutation, we will add a new merged node and merge the outgoing links.

# create a merged node with merged outgoing links
upsert{
  query{
    duplicates(func: eq(tag,"duplicate")){
      outs as olink
    }
    merged as var(func: eq(tag,"merged"))
  } 
  mutation{
    set{
      uid(merged) <olink> uid(outs) .
      uid(merged) <tag> "merged" .
      uid(merged) <name> "C-MERGED" .
    }
  }
}

The graph around the new merged C node looks as below.

Similarly, we can merge the incoming links using the mutation below.

# merge incoming links into newly merged nodes
upsert{
  query{
    duplicate(func: eq(tag,"duplicate")){
      ins as ~ilink
    }
    v as var(func: eq(tag,"merged"))
  } 
  mutation{
    set{
      uid(ins) <ilink> uid(v) .
    }
  }
}

Now, the graph around the merged C looks to be in the shape we need it to be.

Step 3
Finally, we can delete the links around the tagged duplicate nodes “C1” and “C2”. Please take due care while doing this step.

# delete links to 
upsert{
  query{
    duplicate as var(func: eq(tag,"duplicate")){
      ins as ~ilink
      outs as olink
    }

  } 
  mutation{
    delete{
      uid(ins) <ilink> uid(duplicate) .
      uid(duplicate) <olink> uid(outs) .
    }
  }
}

C1 and C2 nodes are now orphans and can be cleaned up if required.

As mentioned earlier, this solution does not involve any python code. Please review.

3 Likes