Filter out node based on connected node values

Hey,

I have two nodes, lets call them Person & Company.

The Person has an edge named Worker pointing to Company.
The company has a bool predicate isDeleted.

I wish to retrieve a Persons by it uid and return nothing if the company isDeleted predicate is set to true.
Preferable by using the root func to get the Person and not the Company and then recourse all decedents.

This worked for me, but is there a better way?

query {
  var(func: eq(user.id, "8cbe5522-6adf-4f2b-8801-8497736c9342")) {
    user.tenant @filter(eq(tnnt.isDeleted, f)) {
      u1 as uid
    }
  }
  get(func: eq(user.id, "8cbe5522-6adf-4f2b-8801-8497736c9342")) @filter(eq(user.isDeleted, f) and eq(len(u1), 1)) {
    user.id
    user.isEnabled
  }
}

Hi @mbn18
With the @cascade directive, nodes that don’t have all predicates specified in the query are removed. Let’s work out for your use case.

Data:

{
  set {
    _:company1 <name> "CompanyABC" .
    _:company1 <dgraph.type> "Company" .
    _:company1 <isDeleted> "true" .
    
   _:naman <worker> _:company1 .
    _:naman <dgraph.type> "Person" .
    _:naman <name> "naman" .
  }
}

Query:

{
 # UID of naman
  foo(func: uid(0xfffd8d67d83ccb99)) @cascade{
    name
    worker @filter(eq(isDeleted, false)){
      name
      uid
    }
  }
}

If there be another company whose isDeleted is false and its name predicate is set, then you would get a single matching company. If any of the predicate is missing from the query traversal, all predicates are removed.

1 Like

Both of the solutions will work fine. But these will be slow on a large amount of data if isDeleted predicate is indexed (reference) and it’ll be better to keep isDeleted predicate unindexed.

1 Like

@Neeraj, interesting, so I should remove the index from isDeleted predicate if that predicate is used only in @filter?

Well, it has more to do with the number of different values a predicate can have rather than if it is used in the filter. For eg. in your case isDeleted can only have two different values, True or False.

Lets say you have 10 million nodes in your dataset. Then the posting list of (<isDeleted>, true) will have almost 5 million nodes and same for the (<isDeleted>, false) case. Hence iterating over them will be a very expensive operation.

If you remove index from isDeleted then it’ll be stored as <predicate, uid> => [value1, value2 ....] and in your case as <isDeleted, 0x123> => [True] and accessing this value will be very efficient.

In the reference I provided above, query with index for sex predicate takes around 300 ms whether sex predicate is used in the @filter or in the main function and it takes around 3 ms if we don’t index the sex predicate (this time sex predicate can be only used in the @filter as the predicate of main query is required to be indexed).

1 Like

Hi @Neeraj,

I removed the bool index and to query actualy got slower.

Here is a simple example:

{
  q(func:type(Measurement)) @filter(eq(meas.isDeleted, f)) {
    count(uid)
  }
}

Without bool index:

{
  "data": {
    "q": [
      {
        "count": 99307
      }
    ]
  },
  "extensions": {
    "server_latency": {
      "parsing_ns": 163375,
      "processing_ns": 203839418,
      "encoding_ns": 9079376,
      "total_ns": 213195197
    },
    "txn": {
      "start_ts": 29146
    },
    "metrics": {
      "num_uids": {
        "_total": 99307,
        "dgraph.type": 0,
        "meas.isDeleted": 99307
      }
    }
  }
}

With bool index:

{
  "data": {
    "q": [
      {
        "count": 99307
      }
    ]
  },
  "extensions": {
    "server_latency": {
      "parsing_ns": 163287,
      "processing_ns": 8838663,
      "encoding_ns": 8884165,
      "total_ns": 17969863
    },
    "txn": {
      "start_ts": 29163
    },
    "metrics": {
      "num_uids": {
        "_total": 99307,
        "dgraph.type": 0,
        "meas.isDeleted": 99307
      }
    }
  }
}

So, by adding the index the query took 1/10 of the previous

Thx

Hey @mbn18,

Can you please share a drive link for your data? I’ll look into it.

1 Like

@Neeraj, Will upload the data (only part of it) to Dgraph customer support channel and open a ticket about this

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.