Filter out node based on connected node values

mbn18 · May 22, 2020, 12:42pm

Hey,

I have two nodes, lets call them Person & Company.

The Person has an edge named Worker pointing to Company.
The company has a bool predicate isDeleted.

I wish to retrieve a Persons by it uid and return nothing if the company isDeleted predicate is set to true.
Preferable by using the root func to get the Person and not the Company and then recourse all decedents.

mbn18 · May 22, 2020, 1:26pm

This worked for me, but is there a better way?

query {
  var(func: eq(user.id, "8cbe5522-6adf-4f2b-8801-8497736c9342")) {
    user.tenant @filter(eq(tnnt.isDeleted, f)) {
      u1 as uid
    }
  }
  get(func: eq(user.id, "8cbe5522-6adf-4f2b-8801-8497736c9342")) @filter(eq(user.isDeleted, f) and eq(len(u1), 1)) {
    user.id
    user.isEnabled
  }
}

Naman · June 14, 2020, 2:32pm

Hi @mbn18
With the @cascade directive, nodes that don’t have all predicates specified in the query are removed. Let’s work out for your use case.

Data:

{
  set {
    _:company1 <name> "CompanyABC" .
    _:company1 <dgraph.type> "Company" .
    _:company1 <isDeleted> "true" .
    
   _:naman <worker> _:company1 .
    _:naman <dgraph.type> "Person" .
    _:naman <name> "naman" .
  }
}

Query:

{
 # UID of naman
  foo(func: uid(0xfffd8d67d83ccb99)) @cascade{
    name
    worker @filter(eq(isDeleted, false)){
      name
      uid
    }
  }
}

If there be another company whose isDeleted is false and its name predicate is set, then you would get a single matching company. If any of the predicate is missing from the query traversal, all predicates are removed.

Neeraj · June 15, 2020, 5:14am

Both of the solutions will work fine. But these will be slow on a large amount of data if isDeleted predicate is indexed (reference) and it’ll be better to keep isDeleted predicate unindexed.

mbn18 · June 19, 2020, 2:07pm

@Neeraj, interesting, so I should remove the index from isDeleted predicate if that predicate is used only in @filter?

Neeraj · June 20, 2020, 6:02am

Well, it has more to do with the number of different values a predicate can have rather than if it is used in the filter. For eg. in your case isDeleted can only have two different values, True or False.

Lets say you have 10 million nodes in your dataset. Then the posting list of (<isDeleted>, true) will have almost 5 million nodes and same for the (<isDeleted>, false) case. Hence iterating over them will be a very expensive operation.

If you remove index from isDeleted then it’ll be stored as <predicate, uid> => [value1, value2 ....] and in your case as <isDeleted, 0x123> => [True] and accessing this value will be very efficient.

In the reference I provided above, query with index for sex predicate takes around 300 ms whether sex predicate is used in the @filter or in the main function and it takes around 3 ms if we don’t index the sex predicate (this time sex predicate can be only used in the @filter as the predicate of main query is required to be indexed).

mbn18 · June 21, 2020, 1:49pm

Hi @Neeraj,

I removed the bool index and to query actualy got slower.

Here is a simple example:

{
  q(func:type(Measurement)) @filter(eq(meas.isDeleted, f)) {
    count(uid)
  }
}

Without bool index:

{
  "data": {
    "q": [
      {
        "count": 99307
      }
    ]
  },
  "extensions": {
    "server_latency": {
      "parsing_ns": 163375,
      "processing_ns": 203839418,
      "encoding_ns": 9079376,
      "total_ns": 213195197
    },
    "txn": {
      "start_ts": 29146
    },
    "metrics": {
      "num_uids": {
        "_total": 99307,
        "dgraph.type": 0,
        "meas.isDeleted": 99307
      }
    }
  }
}

With bool index:

{
  "data": {
    "q": [
      {
        "count": 99307
      }
    ]
  },
  "extensions": {
    "server_latency": {
      "parsing_ns": 163287,
      "processing_ns": 8838663,
      "encoding_ns": 8884165,
      "total_ns": 17969863
    },
    "txn": {
      "start_ts": 29163
    },
    "metrics": {
      "num_uids": {
        "_total": 99307,
        "dgraph.type": 0,
        "meas.isDeleted": 99307
      }
    }
  }
}

So, by adding the index the query took 1/10 of the previous

Thx

Neeraj · June 23, 2020, 8:53am

Hey @mbn18,

Can you please share a drive link for your data? I’ll look into it.

mbn18 · June 25, 2020, 1:49pm

@Neeraj, Will upload the data (only part of it) to Dgraph customer support channel and open a ticket about this

system · July 25, 2020, 1:49pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to filter parent nodes on child nodes attributes Dgraph	1	1469	January 11, 2020
Filtering using two predicates when they are equal Dgraph	1	192	March 12, 2024
Compare predicates from two different nodes Users	5	1341	June 3, 2018
The meaning of @filter on predicates Dgraph	3	321	March 1, 2021
Retrieve a set of predicates for all nodes Dgraph performance	10	2152	July 3, 2020

Filter out node based on connected node values

Related topics