In Dgraph, How to Propagate Data from One Node to Other Nodes Directly or Indirectly Connected?

I’m working on clustering Bitcoin addresses using the multi-input heuristic, and I need to propagate entity labels across connected nodes. Here’s the situation:

  1. I treat each unique Bitcoin address as a node.
  2. For transactions with multiple input addresses, I have addresses A1, A2, A3, A4, and A5, which are connected with edges like this:
  • A1 is connected to A2
  • A2 is connected to A3
  • A3 is connected to A4
  • A4 is connected to A5These addresses form a connected cluster.
  1. Later, I learn that A2 belongs to Binance. I want to propagate the entity label “Binance” to all nodes in the cluster, so querying A5 would return “Binance.”

Questions:

  • In Dgraph, how can I propagate the entity label from A2 to all other connected nodes (A1, A3, A4, A5) efficiently?
  • Is there a way to automatically ensure that all nodes directly or indirectly connected to A2 inherit the entity label without manual traversal?
  • What’s the best approach or query structure in Dgraph for handling this kind of data propagation at scale?

Any suggestions on the best way to implement this would be very helpful. Thanks!

1 Like

I have a similar query.

Not sure what you mean by ’ without manual traversal’.

Here is a way to assign labels for you cluster:
Assuming that the relationship is connect_to, you can use an upsert to set the label of the known node to all other connected nodes in the chain.

upsert {
  query {
    var(func: uid(0x632ea3)) @recurse {
      others as connect_to
    }
    var(func: uid(0x632ea3)) {
     l as label
    }
    var(){
      ml as max(val(l))
    } 
  }

  mutation {
    # we copy the values from the old predicate
    set {
      uid(others) <label> val(ml) .
    }
  }
}

The first block finds all connected node recursively
The second block get the label of the initial node.
It is a Dgraph variable so it is a map → value for the single uid of the node.
To be able to use this label value as a ‘global’ variable, I’m using a trick by using an anonymous block and aggregate the variable using max.
In my case I have only one data so the the max is the value I need but more importantly this an aggregation variable and not a map variable. I can use in the subsequent block as a ‘global’.

The mutation is just assigning the label to all connected node.

This is an efficient way of propagating labels when you start from a known node (by uid or other ID).

You may want to follow ~connect_to for find you A1 from A2 (reverse edge).