Find connected components?

How to find connected components?

i found this closed issue, but the answer is not very helpful: Finding connected components in graph

Thanks!!!

P.S. To be precise, the following graph has two components, how to return them?:

{
    set {
     _:node_1 <code> "node_1" .
     _:node_1 <dgraph.type> "node" .     
     _:node_2 <code> "node_2" .
     _:node_2 <dgraph.type> "node" .  
     _:node_3 <code> "node_3" .
     _:node_3 <dgraph.type> "node" .  
     _:node_4 <code> "node_4" . 
     _:node_4 <dgraph.type> "node" . 
       
     _:node_1 <neighbor> _:node_2 .
     _:node_2 <neighbor> _:node_1 .
     _:node_3 <neighbor> _:node_4 .
     _:node_4 <neighbor> _:node_3 .
    }
}

What is the concept of components in this sample?

{
  q(func: type(node)) @filter(has(neighbor)){
    uid
    name 
    neighbor {
        uid
        name 
        neighbor
      }
  }
}
{
  "data": {
    "q": [
      {
        "uid": "0x2738",
        "name": "node_1",
        "neighbor": [
          {
            "uid": "0x2739",
            "name": "node_2"
          }
        ]
      },
      {
        "uid": "0x2739",
        "name": "node_2",
        "neighbor": [
          {
            "uid": "0x2738",
            "name": "node_1"
          }
        ]
      },
      {
        "uid": "0x273a",
        "name": "node_3",
        "neighbor": [
          {
            "uid": "0x273b",
            "name": "node_4"
          }
        ]
      },
      {
        "uid": "0x273b",
        "name": "node_4",
        "neighbor": [
          {
            "uid": "0x273a",
            "name": "node_3"
          }
        ]
      }
    ]
  }
}

Hi, thanks for the feedback.

In the example above, there are two connected components, one consisting of node_1 and node_2, and one consisting of node_3 and node_4 (in the style of Component (graph theory) - Wikipedia. So ideally, it would be nice to have a query that returns sth like:

[
[node_1, node_2], 
[node_3, node_4] 
]

It is possible to return the connected component that contains e.g. node_1 by running:

{
  q(func: eq(code,node_1), first: 100) @recurse @normalize{
  N as neighbor
  }

  component(func: uid(N)){
  code
  }
}

So the only way i see is to iteratively run this query for each node which has not been found in a component yet. This will result in a number of queries proportional to the number of nodes in the graph (quite infeasible sometimes). So is it possible to return all components at once? with pagination to iteratively fetch them.

P.S. just realized that name has a special meaning and is not indexed. In my local examples i am using code to label a node, so updated my examples above by replacing name by code

If I slightly change your dataset, it changes the whole meaning?

<code>: string .
<neighbor>: [uid] @reverse . //we need reverse
{
    set {
     _:node_1 <code> "node_1" .
     _:node_1 <dgraph.type> "node" .
     _:node_2 <code> "node_2" .
     _:node_2 <dgraph.type> "node" .  
     _:node_3 <code> "node_3" .
     _:node_3 <dgraph.type> "node" .
     _:node_4 <code> "node_4" .
     _:node_4 <dgraph.type> "node" .
       
     _:node_1 <neighbor> _:node_2 .
   #  _:node_2 <neighborConfirm> _:node_1 . #change this pred or remove it
     _:node_3 <neighbor> _:node_4 .
   #  _:node_4 <neighborConfirm> _:node_3 .  #change this pred or remove it
    }
}

Reach all nodes

{
  q(func: type(node)) 
    {
    uid
    code 
    neighbor {
        uid
        code 
        neighbor
      }
	~neighbor { #or use the the pred "neighborConfirm"
        uid
        code 
        neighbor
      }
  }
}

Filter the ones that is reversed

{
  q(func: type(node)) @filter(not has(~neighbor))
    {
    uid
    code 
    neighbor {
        uid
        code 
        neighbor
      }
  }
}

if you wanna use recurse

{
 q(func: type(node)) @filter(not has(~neighbor)) @recurse(depth: 2, loop: true)
    {
    uid
    code
    neighbor
  }
}

Using @recurse as in <neighbor>: [uid] @reverse . is nice syntactic sugar (good to remember), but doesnt change the problem. If i run e.g.

{
  q(func: type(node)) @recurse
    {
    name
    neighbor
    }
}

(Slight modification of your queries above), we get:

{
  "data": {
    "q": [
      {
        "name": "node_1",
        "neighbor": [
          {
            "name": "node_2"
          }
        ]
      },
      {
        "name": "node_2"
      },
      {
        "name": "node_3",
        "neighbor": [
          {
            "name": "node_4"
          }
        ]
      },
      {
        "name": "node_4"
      },
      {
        "name": "node_3",
        "neighbor": [
          {
            "name": "node_4"
          }
        ]
      },
      {
        "name": "node_4",
        "neighbor": [
          {
            "name": "node_3"
          }
        ]
      },
      {
        "name": "node_1",
        "neighbor": [
          {
            "name": "node_2"
          }
        ]
      },
      {
        "name": "node_2",
        "neighbor": [
          {
            "name": "node_1"
          }
        ]
      },
      {
        "name": "node_2",
        "neighbor": [
          {
            "name": "node_1"
          }
        ]
      },
      {
        "name": "node_3",
        "neighbor": [
          {
            "name": "node_4"
          }
        ]
      },
      {
        "name": "node_4",
        "neighbor": [
          {
            "name": "node_3"
          }
        ]
      },
      {
        "name": "node_1",
        "neighbor": [
          {
            "name": "node_2"
          }
        ]
      }
    ]
  },
  "extensions": {
    "server_latency": {
      "parsing_ns": 10169,
      "processing_ns": 5402004,
      "encoding_ns": 42478,
      "assign_timestamp_ns": 534517
    },
    "txn": {
      "start_ts": 30370384
    }
  }
}

Note that every connected component appears many times, e.g. for the connected component [node_1,node_2]:

{
        "name": "node_1",
        "neighbor": [
          {
            "name": "node_2"
          }
        ]
      }

and

      {
        "name": "node_2",
        "neighbor": [
          {
            "name": "node_1"
          }
        ]
      }

Now imagine a connected component consisting of 100’000 nodes. Returning this component should be no problem, but if we return this component more than 100’000 times, we have a problem. So how can we ensure that each connected component is only listed once in the query result??

Sorry, i am mixing up code and name during these queries, just think about it as some label of the nodes.

Not sure what you mean, you can’t get anything treated without applying filters or some param strategy.

Which modifications?

If you filter it by reverse is the fastest one.

This query

{
  q(func: type(node)) @filter(not has(~neighbor))
    {
    uid
    code 
    neighbor {
        uid
        code 
        neighbor
      }
  }
}

Results

{
  "data": {
    "q": [
      {
        "uid": "0x2760",
        "code": "node_1",
        "neighbor": [
          {
            "uid": "0x2761",
            "code": "node_2"
          }
        ]
      },
      {
        "uid": "0x2762",
        "code": "node_3",
        "neighbor": [
          {
            "uid": "0x2763",
            "code": "node_4"
          }
        ]
      }
    ]
  }
}

Your query only works for the case that each component has size 2, what about the following case with two components of size 4 each? (and similar cases with even larger components):

{
    set {
     _:node_1 <code> "node_1" .
     _:node_1 <dgraph.type> "node" .
     _:node_2 <code> "node_2" .
     _:node_2 <dgraph.type> "node" .
     _:node_3 <code> "node_3" .
     _:node_3 <dgraph.type> "node" .
     _:node_4 <code> "node_4" .
     _:node_4 <dgraph.type> "node" .
     _:node_5 <code> "node_5" .
     _:node_5 <dgraph.type> "node" .
     _:node_6 <code> "node_6" .
     _:node_6 <dgraph.type> "node" .
     _:node_7 <code> "node_7" .
     _:node_7 <dgraph.type> "node" .
     _:node_8 <code> "node_8" .
     _:node_8 <dgraph.type> "node" .

     _:node_1 <neighbor> _:node_2 .
     _:node_2 <neighbor> _:node_1 .
     _:node_2 <neighbor> _:node_3 .
     _:node_3 <neighbor> _:node_2 .
     _:node_3 <neighbor> _:node_4 .
     _:node_4 <neighbor> _:node_3 .

     _:node_5 <neighbor> _:node_6 .
     _:node_6 <neighbor> _:node_5 .
     _:node_6 <neighbor> _:node_7 .
     _:node_7 <neighbor> _:node_6 .
     _:node_7 <neighbor> _:node_8 .
     _:node_8 <neighbor> _:node_7 .
    }
}

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.