How to delete duplicate node

Mickey248 · October 16, 2020, 3:25am

i try to delete duplicate. However, i can not delete
it virtualize in dgraph

it is equal in draw.io

but I can try do delete duplicate

it is about a code

all_list = [
[‘A’, ‘C’, [‘F’,‘G’]],
[‘A’, ‘D’, [‘G’,‘H’]],
[‘A’, ‘E’, [‘F’,‘G’]],
[‘B’, ‘C’, [‘F’,‘G’]]
]

list_json =
for i in range(len(all_list)):
list_hashtag =
for j in range(len(all_list[i][2])):
print(‘------------------------------------------------------------------------------------------------’)
print(all_list[i][0],all_list[i][1],all_list[i][2][j])
list_hashtag.append({
“uid”: f":{all_list[i][2][j]}“,
“hashtag”: f”{all_list[i][2][j]}"
})
print(‘------------------------------------------------------------------------------------------------’)
list_json.append({
“user_handle”: f"{all_list[i][0]}“,
“user_name”: f”{all_list[i][0]}“,
“uid”: f”:{all_list[i][0]}“,
“authored”: [
{
“tweet”: f”{all_list[i][1]}",
“tagged_with”: list_hashtag
}
]
})
p = {“set”: list_json}

Mickey248 · October 16, 2020, 3:27am

this is about query code
{
tweet_graph(func: has(user_handle)) {
user_name
authored {
tweet
tagged_with {
hashtag
}
}
}
}

anand · October 16, 2020, 5:49am

Hi @Mickey248,
It looks like we want to merge the “C” Nodes. I am trying to model and solve this as a “Data Merging” problem entirely using DQL, so that the solution is not dependent on any programming language.

Step 1
Let’s create a similar structure. If we are aware of duplicates, we need to add corresponding tags, so that this information is expressed in data and can be so queried.

{
  set{
    _:a <ilink> _:c1 .
    _:a <name> "A" .
    _:b <ilink> _:c2 .
    _:b <name> "B" .
    
    _:c1 <olink>  _:g .
    _:c1 <olink>  _:f .    
    _:c2 <olink>  _:f .
    _:g <name> "G" .
    _:f <name> "F" .
    
    
    _:c1 <name> "C1" .    
    _:c1 <tag> "duplicate" .
    _:c2 <name> "C2" .
    _:c2 <tag> "duplicate" .
  }
}

At this point, the graph around C node looks as below, quite similar to what you have.

Step 2
As a first mutation, we will add a new merged node and merge the outgoing links.

# create a merged node with merged outgoing links
upsert{
  query{
    duplicates(func: eq(tag,"duplicate")){
      outs as olink
    }
    merged as var(func: eq(tag,"merged"))
  } 
  mutation{
    set{
      uid(merged) <olink> uid(outs) .
      uid(merged) <tag> "merged" .
      uid(merged) <name> "C-MERGED" .
    }
  }
}

The graph around the new merged C node looks as below.

Similarly, we can merge the incoming links using the mutation below.

# merge incoming links into newly merged nodes
upsert{
  query{
    duplicate(func: eq(tag,"duplicate")){
      ins as ~ilink
    }
    v as var(func: eq(tag,"merged"))
  } 
  mutation{
    set{
      uid(ins) <ilink> uid(v) .
    }
  }
}

Now, the graph around the merged C looks to be in the shape we need it to be.

Step 3
Finally, we can delete the links around the tagged duplicate nodes “C1” and “C2”. Please take due care while doing this step.

# delete links to 
upsert{
  query{
    duplicate as var(func: eq(tag,"duplicate")){
      ins as ~ilink
      outs as olink
    }

  } 
  mutation{
    delete{
      uid(ins) <ilink> uid(duplicate) .
      uid(duplicate) <olink> uid(outs) .
    }
  }
}

C1 and C2 nodes are now orphans and can be cleaned up if required.

As mentioned earlier, this solution does not involve any python code. Please review.

Mickey248 · November 10, 2020, 9:44am

thank you it is very useful to work

Topic		Replies	Views
Deduplication based on multiple field values Dgraph kind:question , dgraph	8	1110	December 1, 2020
I can't remove duplicates Dgraph	3	97	December 18, 2024
Deleting nodes? Dgraph	18	7181	March 10, 2021
Query to delete duplicates Users mutation	4	791	March 2, 2020
How to make the query results not duplicate Dgraph kind:question , dgraph	13	755	September 17, 2020

How to delete duplicate node

Related topics