How to delete duplicate node

anand · October 16, 2020, 5:49am

Hi @Mickey248,
It looks like we want to merge the “C” Nodes. I am trying to model and solve this as a “Data Merging” problem entirely using DQL, so that the solution is not dependent on any programming language.

Step 1
Let’s create a similar structure. If we are aware of duplicates, we need to add corresponding tags, so that this information is expressed in data and can be so queried.

{
  set{
    _:a <ilink> _:c1 .
    _:a <name> "A" .
    _:b <ilink> _:c2 .
    _:b <name> "B" .
    
    _:c1 <olink>  _:g .
    _:c1 <olink>  _:f .    
    _:c2 <olink>  _:f .
    _:g <name> "G" .
    _:f <name> "F" .
    
    
    _:c1 <name> "C1" .    
    _:c1 <tag> "duplicate" .
    _:c2 <name> "C2" .
    _:c2 <tag> "duplicate" .
  }
}

At this point, the graph around C node looks as below, quite similar to what you have.

Step 2
As a first mutation, we will add a new merged node and merge the outgoing links.

# create a merged node with merged outgoing links
upsert{
  query{
    duplicates(func: eq(tag,"duplicate")){
      outs as olink
    }
    merged as var(func: eq(tag,"merged"))
  } 
  mutation{
    set{
      uid(merged) <olink> uid(outs) .
      uid(merged) <tag> "merged" .
      uid(merged) <name> "C-MERGED" .
    }
  }
}

The graph around the new merged C node looks as below.

Similarly, we can merge the incoming links using the mutation below.

# merge incoming links into newly merged nodes
upsert{
  query{
    duplicate(func: eq(tag,"duplicate")){
      ins as ~ilink
    }
    v as var(func: eq(tag,"merged"))
  } 
  mutation{
    set{
      uid(ins) <ilink> uid(v) .
    }
  }
}

Now, the graph around the merged C looks to be in the shape we need it to be.

Step 3
Finally, we can delete the links around the tagged duplicate nodes “C1” and “C2”. Please take due care while doing this step.

# delete links to 
upsert{
  query{
    duplicate as var(func: eq(tag,"duplicate")){
      ins as ~ilink
      outs as olink
    }

  } 
  mutation{
    delete{
      uid(ins) <ilink> uid(duplicate) .
      uid(duplicate) <olink> uid(outs) .
    }
  }
}

C1 and C2 nodes are now orphans and can be cleaned up if required.

As mentioned earlier, this solution does not involve any python code. Please review.

Topic		Replies	Views
Why does it still exist after deleting the node? It has not really been deleted Dgraph	17	1850	March 12, 2022
How can i delete this nested node here? GraphQL	1	634	April 10, 2022
Failed to delete and confusion about query Dgraph dgraph	2	690	January 3, 2022
Query to delete duplicates Users mutation	5	683	April 1, 2020
Hard time trying do delete things Dgraph	17	2952	July 18, 2018

How to delete duplicate node

Related Topics