Does order in a DQL query matter?

rahst12 · February 11, 2023, 2:35am

Hello,
I can’t seem to get a count working when I’m recursing through the graph and applying a filter on the node I want to issue the count on. I have a depth of 10 set on the recurse.

Test 1:
I started with:

count(myNode)
myNode @filter(NOT eq(name, "Bob"))

Result: I found that the query was counting every “myNode” including the ones with name John.

Test 2:
I changed the count to include the filter:

count(myNode @filter(NOT eq(name, "Bob")))
myNode @filter(NOT eq(name, "Bob"))

Result: The counts now appear correct, however, I’m missing large portions of the result I’m expecting.

Test 3:
I reversed the order - I put count below the myNode in the query. I thought maybe the count would only count the myNodes that had been filtered above it.

myNode @filter(NOT eq(name, "Bob"))
count(myNode)

Result: Back to the first problem, it’s counting all the myNodes, however all the data is present.

Test 4:
I kept the count below the myNode and added in the filter again.

myNode @filter(NOT eq(name, "Bob"))
count(myNode @filter(NOT eq(name, "Bob")))

Result: The query is returning the proper data and is correctly counting with the filter.

Looking at the count, filter, and recurse documentation, I can’t find anything that would explain this behavior.

I’m still really new to querying DGraph querying…

Why does the order/placement of the count in the query (above vs. below the node you want to count) make a difference in what is returned? Is it expected to always need the filter too, even though the myNode was filtered in the query already?

(Or if this seems like I’ve got something else really messed up, I’d appreciate that feedback too!)

Thanks,
Ryan

amanmangal · February 11, 2023, 2:40am

I think you should consider doing the filter at the root of the query instead. I do not think the order matters here, you need to specify filter twice as you have in the final version of the query. If you share the full query, maybe it is possible to simplify it.

Hope that helps.

MichelDiz · February 11, 2023, 3:40am

This is expected. This is an edge count. So line 1 and 2 are independent.

This is not the recommended way. We don’t have any reference to this usage in docs. So anything not documented is just bug or adventures.

The recomended would be.

{  
   var(func: eq(isRootNode, true)) {
      MN as myNode
   }
   myNodeTotal(func: uid(MN)) {
        count(uid)
  }
   myNodeNoBob(func: uid(MN)) @filter(NOT eq(name, "Bob")) {
        count(uid)
  }
}

This shouldn’t happen. Cuz they are independent. It could be correlated if you use variables.

rahst12 · February 11, 2023, 4:48am

Hi @amanmangal and @MichelDiz,

Thanks! Clearly the filter is needed to get the right counts here, so just focusing on the order - above vs. below the node.

I can’t share my query, but I can replicate it. I followed the first steps 1-6 in the DGraph Tour: Graphs | Intro | Dgraph Tour

Here’s the equivalent query I’m running:

{
  michael(func: eq(name, "Michael")) @recurse(depth:10){
    name
    age
    owns_pet
    friend~
    friend @filter(NOT eq(name, "Sarah")) 
  }
}

The full graph appears (except Sarah).

Equivalent to Test 2 above:
The count is above the friend node.

{
  michael(func: eq(name, "Michael")) @recurse(depth:10){
    name
    age
    owns_pet
    friend~
    count(friend @filter(NOT eq(name, "Sarah")))
    friend @filter(NOT eq(name, "Sarah")) 
  }
}

Adding the count with the filter causes a ton of the graph to not return

Equivalent to Test 4 above:
The count is below the friend node.

{
  michael(func: eq(name, "Michael")) @recurse(depth:10){
    name
    age
    owns_pet
    friend~
    friend @filter(NOT eq(name, "Sarah"))
    count(friend @filter(NOT eq(name, "Sarah")))
  }
}

The full graph comes back here.

I also tried putting the count inside the friend node:

{
  michael(func: eq(name, "Michael")) @recurse(depth:10){
    name
    age
    owns_pet
    friend~
    friend @filter(NOT eq(name, "Sarah")) {
      count(uid)
    }
  }
}

But I get this error “recurse queries require that all predicates are specified in one level”:

And I tried putting the filter at the function declaration:

{
  michael(func: eq(name, "Michael")) @recurse(depth:10) @filter(NOT eq(name, "Sarah")){
    name
    age
    owns_pet
    friend~
    friend 
    count(friend)
  }
}

But the filter didn’t take effect.

@MichelDiz I haven’t used variables before, I’m still playing around with converting what I’ve got here to something more like what you recommended.

Any thoughts on the before vs. after oddness or recommendations on a better way to do this?

Thanks,
Ryan

MichelDiz · February 11, 2023, 6:53pm

So, your problem is exclusive to Recurse query. Without it any order doesnt matter.

Understand that Recurse is a kind of automation. Where it will continue expanding according to the query body (list of predicates) given. So order can influence in this case. And results can be unforeseen for each case. As this is a recursive query, there is no way to predict exactly what will be returned(you have to explore each case). Just that it will keep expanding if there are linked edges.

By the way, Recurse query doesn’t accept nested block. You can only set the main body and it will keep expanding recursively. It doesn’t make sense to set nested in a recursive query. Unless you want to recurse in the nested block. But Dgraph doesn’t support it. You would have to make multiple blocks and each block a recurse.

rahst12 · February 13, 2023, 4:47pm

Hi @MichelDiz,
Thanks for the confirmation that order does matter in DQL queries, when @recurse is being used. I’d love to understand a little more why that’s the case. This seems like a pretty big bug - it was at least quite surprising to us. I haven’t scoured all the DGraph documentation, but I haven’t seen anything about order making a difference on either the @recurse or the newer Documentation Revamp Graph Traversal Section.

Could you explain a little more why order matters in this case?

Also, why doesn’t the @filter work when it’s applied at the top of the function?

I appreciate the help!
Thanks,
Ryan

MichelDiz · February 13, 2023, 5:07pm

I can’t get to the level of detail because I didn’t create that code and its algorithm. But I know that it recursively applies everything in the body and can end up influencing other edges that have some relationship.

But this is normal, recurse is a type of query that can generate unexpected results. You shouldn’t use recurse as a source of truth. But to explore its objects. Don’t do fancy things with it. Always create predictable queries. Or you will have problems.

An example of problems with recurse in the real world would be “rm” on unix-like systems. It can be problematic if you use “-r” for example “rm -fr *” this can be catastrophic. So we never use recurse on sensitive things.

Always make predictable patterns. And use non-predictable patterns just for exploration. IMHO.

It does. Not sure it doesn’t for you.

Run this query

{
  q(func: eq(dgraph.type, "Person"), first:100 ) @recurse(depth:10)
   @filter(NOT 
    ( eq(name, "Alice") 
    or eq(name, "Bob") 
    or eq(name, "Charlie") 
    or eq(name, "Dave")
    or eq(name, "Raj")
  ))
    {
      name
      dgraph.type
      age
      owns_pet
      friend~
      friend 
      count(friend)
  }
}

in Dgraph Ratel Dashboard - You will see that the filter used on the Root query(root params) will be applied.

rahst12 · February 13, 2023, 5:48pm

We’re fully exploring a node’s graph. Thus far, we’ve had reliable results - short of the count issue. We’re navigating to a node using a eq string match on an indexed predicate value and then following specific relationships (not all relationships) out of that node until the graph is fully explored. Is there a better practice to do this that’s more reliable than @recurse?

We need the count with the @filter to have the same filter as the filter we’re applying on the relationships – the query would be cleaner if this were at the top of it.

Outside of the count and @filter nothing really fancy going on… Being pretty new to DGraph, what would you consider fancy that we should stay away from?

@filter at the top of the query
For the @filter query at the top of the query… It looks like the queries we’re running are different. You’re doing an equals match on a dgraph.type, "Person" and I’m doing an equals match on name. I’d imagine that changes the starting point for the @recurse query. Could this be the difference why @filter doesn’t work at the root? – this is repeatable in the Dgraph Tours Graphs | Intro | Dgraph Tour

{
  michael(func: eq(name, "Michael")) @recurse(depth:10) @filter(NOT eq(name, "Sarah")){
    name
    age
    owns_pet
    friend~
    friend 
    count(friend)
  }
}

vs.

{
  q(func: eq(dgraph.type, "Person"), first:100 ) @recurse(depth:10)
   @filter(NOT 
    ( eq(name, "Alice") 
    or eq(name, "Bob") 
    or eq(name, "Charlie") 
    or eq(name, "Dave")
    or eq(name, "Raj")
  ))
    {
      name
      dgraph.type
      age
      owns_pet
      friend~
      friend 
      count(friend)
  }
}

If you were to run this query, does your @filter still work?

{
  q(func: eq(name, "Alice") ) @recurse(depth:10)
   @filter(NOT 
    ( eq(name, "Bob") 
    or eq(name, "Charlie") 
    or eq(name, "Dave")
    or eq(name, "Raj")
  ))
    {
      name
      dgraph.type
      age
      owns_pet
      friend~
      friend 
      count(friend)
  }
}

As an aside, coming out of this conversation, is GitHub the means to report potential bugs and request documentation improvements for the team to take a look at and triage, or have ‘on file’ for others and later?

Thanks,
Ryan

MichelDiz · February 13, 2023, 6:17pm

Use normal queries. Well defined queries. No recurse I mean.

Not using filters in a count or even recurse. You can use them, but you should explore your chances of not getting the expected results.

We call “root query”. Is the params we give in the “top” of each query block. What differs from “query body” and “nested blocks”.

It works the problem is your query that is wrong. I gonna explain.

The difference of those queries is that. I’m querying for all users and applying the filter accordingly. When you do “q(func: eq(name, “Alice”) )” you are querying only for Alice. Not all users. So you are selecting Alice and thinking that it would somehow find the other users.

q(func: eq(name, "Alice") )

This above is the Query Root. And this bellow is the Filter to be applied against the query root.
The recurse has not yet been initialized. It is after these parameters are executed that the Recurse starts.

@filter(NOT 
    ( eq(name, "Bob") 
    or eq(name, "Charlie") 
    or eq(name, "Dave")
    or eq(name, "Raj")
  ))

And this

{
      name
      dgraph.type
      age
      owns_pet
      friend~
      friend 
      count(friend)
  }

Is the query body that in the case of a recurse will be like a “template to follow”.

The query root params are like “SELECT * USERS FROM GRAPH”.
In your example it would be like “SELECT “Alice” FROM USERS” did you get it?
And the query body is more like selecting what you want to return.

Yes. Docs you can reach out the docs repo for any kind of requests.
But it is better to ask here first, before creating any issue there. And create the issue once you are sure.

rahst12 · February 14, 2023, 12:02am

Hey @MichelDiz,
I’m starting learn the vocab a bit better now… query root, etc - thanks. I created the examples queries above to closely mimic the queries we’re performing on a much larger DGraph. I needed them in an easily repeatable way to illustrate the challenges. A couple questions/comments related to scale below and I have a few GitHub issues I’d like to recommend at the bottom.

Are there query performance considerations when querying all nodes of a particular type and then applying the filter? That essentially sounds like a table scan. We have 110GB worth of nodes (tens of millions) we’d be issuing that style of query on. Our thought process led us to believe if we directly queried the node of interest by an indexed predicate, that the query would be significantly more performant. The result only contains the recursed nodes/edges of that node of interest and it never had to traverse irrelevant nodes in the graph from a SELECT ALL style query.

I’m not looking for all users, just the users who have some connection to Michael, less anyone named Sarah. The DGraph Tour query is able to find everyone (including pets) performing an @recurse query following the relationships. What doesn’t work in this query is the @filter. I see it does work in your query when you are querying for all nodes vs. a specific node. This is odd behavior.
I listed this as GitHub Ticket #4 below with repeatable examples in DGraph Tour above.

My workaround for this is to put the @filter in the query body on each of the nodes I want it applied to.

To expand a little on why we don’t think it’s advisable to change the query to a SELECT ALL NODES style… The query which searches on a specific node’s value returns a tiny sub-set of the entire data in our production DGraph. There’s many millions of other nodes it never had to touch. If we were to change the query structure to query for all nodes first, would it not have to iterate through every node first to complete the query? Maybe we’re off-base here with our experience with Mongo, Elastic, SQL, etc.

This tidbit is super helpful. It made me start to think about the query lifecycle when it’s inside Dgraph. Is there a query lifecycle that’s shared anywhere? Or general notes/discussions on it?

Proposed GitHub Tickets

Docs on using @count with @recurse. The @count documentation page should reflect the changed behavior when used in conjunction with @recurse… Something like if PredicateA is returned in a query AND PredicateA is also counted, it must be ordered in the query AFTER the initial predicate. is returned when using an @recurse.
Docs on using @count with an @filter. The @count documentation page should say it can be combined with an @filter and mention the issues with @recurse.
Feature: Request the root level @filter be recursively applied to nodes/predicates when @recurse is used. (Example above)
Bug: The @filter with @recurse only works when performing a full SELECT ALL / Table Scan approach of DGraph vs. navigating directly to a node (reference the examples above from the DGraph tutorial of it not working and @MichelDiz example of it working with a different style query.) Odd it works in one case and not the other.

Thanks again for all the help and information,
Best,
Ryan

MichelDiz · February 14, 2023, 2:33am

Yes. If you have a huge Tablet indexation for the Type. This could potentially increase latency. There are ways to mitigate this.

When you talk about the query root, you are not traversing anything. The query root param is applied only for the Root target nothing more. Traversing is done through nesteding.

But depending on the relationship you will need to query for all to all. Many to many.

Stay with me, it works. But not the way you think it does.

It is expected behavior.

This is the way.

I’m not advising, I’m just giving examples of how it works. In some parts I advised, but that’s just my opinion. It’s preferable to have predictable queries than use recurse. But it is your call if you feel okay and understand how the query works.

Yes. But you can have techniques like naming. For example “product.name”, “user.name”, “object.name”. With this you can isolate this side effect in a big dataset. You can also create a structure on top of your data and use edges to expand your data. But this is a bit complicated.

Not sure, can you share an example of what you seek? I get it, but not sure about the form of it.

1 and 2 are fine. Something to the docs team to take a look.

The 3 it is off the table. This is how Dgraph is designed.
And about the 4, that’s not an issue at all. I think I explained why this happens.
The root query will never be applied to the query body or nested blocks. It is used to find the target. If you wanna do filters with recurse you have to do in the query body itself.

rahst12 · February 14, 2023, 6:29pm

Here’s two articles that talk about the Elastic Query then Fetch phase. It helps understand where aggregations, filters, etc happen. I’ve come across more detailed diagrams too, but couldn’t find them.

I submitted the first two to GitHub for some improved usage docs:

[Documentation]: DQL requires a specific order when using @count with @recurse · Issue #461 · dgraph-io/dgraph-docs · GitHub
[Documentation]: Provide an example of how to use count a predicate that with a filter · Issue #462 · dgraph-io/dgraph-docs · GitHub

Thanks for the help!

MichelDiz · February 14, 2023, 7:18pm

I’m quite sure we dont have anything related to the Query exec part by part but we have this

Which helps you understand how the Query is processed in the Cluster.

You can also check the paper Dgraph Whitepapers: In-Depth Insights and Analysis

You can also learn a lot from Jaeger tracing https://dgraph.io/docs/deploy/tracing/
If you set up the Jaeger with Dgraph cluster getting like 99% of the calls. You can see the query execution. It wont teach you about syntax or something. But you may understand something in the process.

MichelDiz · February 14, 2023, 7:24pm

BTW check this out

Topic		Replies	Views
DQL: Order of filter execution vs pagination Dgraph kind:question	4	417	March 27, 2021
Adding sorting to DQL query reduces the number of results Dgraph kind:bug , dql	8	1363	September 21, 2021
Sorting and counting issue Dgraph	16	1940	November 22, 2022
Switching the order of the `func` and `@filter` provides different results (v1.1.1) Dgraph kind:bug	4	1066	June 11, 2020
Count total before pagination Dgraph	8	512	September 21, 2021

Does order in a DQL query matter?

Related Topics