Performance: Index VS Label

So, labels as far as I know is about creating nodes that relates to other nodes in order to “label” them and create some kind of “group hash”. I think that Graph DBs that implement Cypher handle this in a native way, but as far as I see this will need to be manually implemented in DGraph (I’m wrong?).

In the other hand there’s also another option that is creating an schema like this:

mutation {
    schema {
        group: string @index(exact,term) .
        name: string @index(exact,term) .
        edge: uid @count .
        r_edge: uid @count .
        other: string @index(exact,term) .
    }
}

And set the group in each node. Nevertheless I think this implies to make a full search in the Graph DB nodes to filter one node that belongs to a group with a special feature (Example: eq(name, “Rafa”).

The questions are:

  • which one is better?
  • If the first option is the best which is the correct way to implement this in DGraph?
  • Is there a fast query to extract a certain label node “Person” (without iterating all the Graph DB) and then extract a node within the relations of that label (that defines a group) with a special feature? (like the example: eq(name, “Rafa”)

Thanks!.

as far as I see this will need to be manually implemented in DGraph

You are correct. In Dgraph, you have to handle this yourself.

There are two different options for grouping nodes.

  1. As you suggest, have a group predicate in your schema. And then set this to be the name of the group. With appropriate indexing, you can then search for all members of a group, filter in/out groups when doing normal queries etc.
  2. Instead, you could have a different predicate for each group. E.g. All nodes in group A could have a group_a predicate. All members of group B could have a group_b predicate. Then you can use the has function to filter and search in queries.

We recommend to use the second approach. This is because it scales better and results in lots of smaller predicates rather than one large predicate.

1 Like

Is there a fast query to extract a certain label node “Person” (without iterating all the Graph DB) and then extract a node within the relations of that label (that defines a group) with a special feature? (like the example: eq(name, “Rafa”)

As long as you have the appropriate indexing (which you do in your example), the lookup will be fast. It doesn’t have to search through all nodes, it can just go straight to the “Rafa” node.

EDIT: Sorry, I misread your question. Once you’re at the “Rafa” node, if you have a different predicate for each label, you can just follow the reverse label edge to find all other nodes with the same label. You need to use a @reverse index to enable reverse edge traversal.

1 Like

I don’t fully get it yet.

I understand that having a different predicate is the best way to implement labels, so whats better to do?

  • Have different edge predicates (that will change on runtime depending if user adds new labels) for each label in each node with an edge to an end “label” node. For example:
    SET SCHEMA
mutation {
    schema {
        pure_label: bool .
        name: string @index(exact,term) .
        label1: uid @count @reverse .
        label2: uid @count @reverse .
        label3: uid @count @reverse .
    }
}

SET NODES

mutation {
  set {
    _:label1 <name> "label1" .
    _:label1 <pure_label> "true" .
    _:node1_1 <name> "node1_1" .
    _:node1_1 <label1> _:label1 .
    _:node2_1 <name> "node2_1" .
    _:node2_1 <label1> _:label1 .
    
    _:label2 <name> "label2" .
    _:label2 <pure_label> "true" .
    _:node1_2 <name> "node1_2" .
    _:node1_2 <label2> _:label2 .
    _:node2_2 <name> "node2_2" .
    _:node2_2 <label2> _:label2 .
    
    _:label3 <name> "label3" .
    _:label3 <pure_label> "true" .
    _:node1_3 <name> "node1_3" .
    _:node1_3 <label3> _:label3 .
    _:node2_3 <name> "node2_3" .
    _:node2_3 <label3> _:label3 .
  }
}
  • Have different boolean predicates (that will change on runtime depending if user adds new labels) for each label in each node. For example:
    SET SCHEMA:
mutation {
    schema {
        name: string @index(exact,term) .
        label1: bool @count .
        label2: bool @count .
        label3: bool @count .
    }
}

SET NODES

mutation {
  set {
    _:node1_1 <name> "node1_1" .
    _:node1_1 <label1> "true" .
    _:node2_1 <name> "node2_1" .
    _:node2_1 <label1> "true" .
    
    _:node1_2 <name> "node1_2" .
    _:node1_2 <label2> "true" .
    _:node2_2 <name> "node2_2" .
    _:node2_2 <label2> "true" .
    
    _:node1_3 <name> "node1_3" .
    _:node1_3 <label3> "true" .
    _:node2_3 <name> "node2_3" .
    _:node2_3 <label3> "true" .
  }
}

I would recommend the first approach, to use edge predicates. The main advantage that this gives you is being able to find all nodes that have a particular label (by following the edges out of the label node). Other than this aspect, the performance of both solutions should be about the same.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.