Query to recursively find the entire subgraph/cluster that a node belongs to

I have a graph of customer_id, device_id, phone_number, email nodes, where each customer_id node also has a label attribute. Each customer_id can have 0 or more device_ids, phone_numbers, or emails with has_device, has_phone_number, has_email relationships.

I now want to query a single customer_id and return all other nodes in the graph that are connected in any way to this customer_id. In other words, I want to get the entire subgraph/cluster of nodes that a single customer_id is connected to.

Currently I am using the following query which works:

{
  cluster(func: eq(customer_id, 123456)) @recurse {
    customer_id
    label
    has_device
    ~has_device
    device_id
    has_phone_number
    ~has_phone_number
    phone_number
    has_email
    ~has_email
    email
	}
}

I am brand new to Dgraph and GraphQL, and so I am not sure if this query is the right way to do this, or if it is very inefficient, etc. Could you please recommend whether or not this query is ok for what I want, or are there better ways?

Schema:

Hey @kevin.obrien,
Welcome to Dgraph. Can you please share the schema that you’re using?

Added it to my original question

Thank you @kevin.obrien.
You don’t require to create separate nodes for email and phone_number. You can just give them as predicates to the customer_id or devide_id nodes. Then you can search for customer_id = 123456 and you can get all the devices connected to it and values of all of its predicates like label, emails and phone_number etc.

I don’t understand why emails and phone_numbers would be treated differently to device_ids. They are all different types of nodes connected to customer_ids with has_x relationships. Could you write a sample query so I understand exactly what you mean please? Thanks

I think @kevin.obrien wants to do a graph search and needs all connected nodes and not just connected at level 1. The current query in the questions works since it runs the same query recursively on all the output nodes. But this can quickly time-out or not return anything if the sub-graph is very big. See here.

@kevin.obrien from the query, I am trying to understand that do you want customers who share same phone number, email or device?

device_id is representing a device which is a different entity. Whereas email and phone_number are just values associated with the customer or the device.

But generally customers don’t share phone number or email.

@Anurag, yes I want all connected nodes, not just at level 1. In general terms, I want to return all nodes (customers, devices, phone numbers, email addresses) that have a path from the queried node to.

@Neeraj, actually in my case, there are customers sharing the same phone number or email addresses (fake, fraudulent accounts) and this is one of the problems we are trying to tackle.

1 Like

Oh, in that case, your query might be quite tricky. Let me tag @MichelDiz, he’ll be able to help you with the same.

1 Like

Hi @kevin.obrien, Your query is correct and should work fine unless the graph explodes very quickly to very large number of nodes in which case you might want to restrict the depth by using depth paramater.

The below query identifies the two subgraphs based on whether $name1 is node g or node h

query(func: eq(name, $name1) ) @recurse {
       name
       relation
       ~relation
   }

Schema

name: string @index(exact, trigram) .
relation: uid @reverse .

Mutation:

        _:nodeA <name> "node a" .
        _:nodeB <name> "node b" .
        _:nodeC <name> "node c" .
        _:nodeD <name> "node d" .
        _:nodeE <name> "node e" .
        _:nodeF <name> "node f" .
        _:nodeG <name> "node g" .
        _:nodeH <name> "node h" .
        _:nodeI <name> "node i" .
        _:nodeJ <name> "node j" .
        _:nodeK <name> "node k" .
        _:nodeA <relation> _:nodeG .
        _:nodeA <relation> _:nodeD .
        _:nodeB <relation> _:nodeA .
        _:nodeC <relation> _:nodeB .
        _:nodeH <relation> _:nodeI .
        _:nodeI <relation> _:nodeJ .
        _:nodeH <relation> _:nodeK .
        _:nodeK <relation> _:nodeJ .
1 Like

Well, @kevin.obrien , the way you are doing it is the expected one for this scenario. Another way would be to write the same query in several blocks, but the data order could be wrong. Or you can do a single static query representing the whole tree you have.

A static query would be something like 1ms faster (that is, not worth it) in most cases. But it would certainly be much faster in gigantic Schemas. It is a trade-off.

TIP:

To make your query smaller, you can use the Type Trick. Taking @Anurag’s sample. You can do this

{
  query(func: eq(name_2, "node g") ) @recurse {
       expand(a_b_c_and_d,e_f_g_and_h,rest)
       #expand( myType1, myType2, myType3 )
   }
}

This will use the type to expand all edges you have.

Modified sample:

Some repeated preds are just examples

name: string @index(exact, trigram) .
name_2: string @index(exact, trigram) .
name_3: string @index(exact, trigram) .
relation: [uid] @reverse .

type a_b_c_and_d {
 name
 relation
 <~relation>
}

type e_f_g_and_h {
 name_2
 relation
 <~relation>
}

type rest {
 name_3
 relation
 <~relation>
}
{
    set {
      _:nodeA <name> "node a" .
      _:nodeA <dgraph.type> "a_b_c_and_d" .

      _:nodeB <name> "node b" .
      _:nodeB <dgraph.type> "a_b_c_and_d" .

      _:nodeC <name> "node c" .
      _:nodeC <dgraph.type> "a_b_c_and_d" .

      _:nodeD <name> "node d" .
      _:nodeD <dgraph.type> "a_b_c_and_d" .

      _:nodeE <name_2> "node e" .
      _:nodeE <dgraph.type> "e_f_g_and_h" .

      _:nodeF <name_2> "node f" .
      _:nodeF <dgraph.type> "e_f_g_and_h" .

      _:nodeG <name_2> "node g" .
      _:nodeG <dgraph.type> "e_f_g_and_h" .

      _:nodeH <name_2> "node h" .
      _:nodeH <dgraph.type> "e_f_g_and_h" .

      _:nodeI <name_3> "node i" .
      _:nodeI <dgraph.type> "rest" .

      _:nodeJ <name_3> "node j" .
      _:nodeJ <dgraph.type> "rest" .

      _:nodeK <name_3> "node k" .
      _:nodeK <dgraph.type> "rest" .

      _:nodeA <relation> _:nodeG .
      _:nodeA <relation> _:nodeD .
      _:nodeB <relation> _:nodeA .
      _:nodeC <relation> _:nodeB .
      _:nodeH <relation> _:nodeI . 
      _:nodeI <relation> _:nodeJ .
      _:nodeH <relation> _:nodeK .
      _:nodeK <relation> _:nodeJ .
    }
}