# Given a set of nodes, how to optimally deduplicate the set of nodes connected to them with a given edge?

Hi, I am fairly new to dgraph and just wanted to confirm if I am going about this the most optimal way.

My current problem is that I am trying to deduplicate seasonal fans across multiple teams to determine how many unique fans a team has out of all of the fans of a set of teams. So far I have not seen ideal response times and I am not sure if it is because of how I am modeling the data or something else.

The first data model I used is below (the A and B correspond to different data sources):

``````name: string @index(term) .

fan_s2020_A: [uid] @reverse .
fan_s2020_B: [uid] @reverse .
fan_s2019_A: [uid] @reverse .
fan_s2019_B: [uid] @reverse .

type Person {
fan_s2020_A
fan_s2020_B
fan_s2019_A
fan_s2019_B
}

type Team {
name
}
``````

Example query deduplicating fans across two teams:

``````{
# Fans of first team for 2020 season
var(func: eq(name, "team-0")) {
~fan_s2020_A {
fan_0_A as uid
}
~fan_s2020_B {
fan_0_B as uid
}
}

var(func: uid(fan_0_A, fan_0_B)) {
fans_0 as uid
}

var(func: eq(name, "team-1")) {
~fan_s2020_A {
fan_1_A as uid
}
~fan_s2020_B {
fan_1_B as uid
}
}

# Fans of second team
var(func: uid(fan_1_A, fan_1_B)) {
fans_1 as uid
}

# Unique fan counts of each team
unique_0_fan(func: uid(fans_0)) @filter(NOT uid(fans_1)) {
count(uid)
}
unique_1_fan(func: uid(fans_1)) @filter(NOT uid(fans_0)) {
count(uid)
}

# Total fan count
union(func: uid(fans_1, fans_0)) {
count(uid)
}
}
``````

As I compare more and more teams I would just create more `var` blocks for the other teams and add those variables to the NOT filter (i.e. `NOT uid(fans_0, fans_1, ...)`) in the unique count queries.

I have also tried modelling this data similarly to what is described in this comment in another thread where instead of `origin` as the facet I had `data_provider`. The schema/example query for that is below:

``````name: string @index(term) .

fan_s2020: [uid] @reverse .
fan_s2019: [uid] @reverse .

relates_to: [uid] @reverse .

type Person {
fan_s2020
fan_s2019
}

type Queue {
relates_to
}

type Team {
name
}
``````
``````{
# Fans of first team
var(func: eq(name, "team-0")) {
~relates_to {
~fan_s2020 {
fans_0 as uid
}
}
}

# Fans of second team
var(func: eq(name, "team-1")) {
~relates_to {
~fan_s2020 {
fans_1 as uid
}
}
}

# Unique fan counts of each team
unique_0_fan(func: uid(fans_0)) @filter(NOT uid(fans_1)) {
count(uid)
}
unique_1_fan(func: uid(fans_1)) @filter(NOT uid(fans_0)) {
count(uid)
}

# Total fan count
union(func: uid(fans_1, fans_0)) {
count(uid)
}
}
``````

I found that the first data model seemed to perform slightly better than the second when looking at a small number of teams, but as the team count got larger, they both seemed to perform the same.

Am I going about this the right way or should one of these data models outperform the other when the team count is large? Or are my queries/data models not optimized for this? Feedback will be greatly appreciated.