Unexpected output when adding `first` filter on nested graph

amaster507 · July 20, 2020, 5:17pm

Version: v2.0.0-rc1-518-g3c710183a

Schema:

type Task {
  id: ID
  name: String
  occurrences: [TaskOccurrence]
}
Type TaskOccurrence {
  id: ID
  due: DateTime
  comp: DateTime
}

Right now most every Task has one TaskOccurrence. What I want to do is get the first X Tasks with their first occurrence. So my query:

{
  queryTask(first: 24) {
    id
    name
    occurrences(first: 1) {
      due
      comp
    }
  }
}

This returns my data though that within the whole dataset there is only one occurrence instead of just one occurrence for each Task. What it looks like is happening underneath is that TaskOccurrences are being queried with a limit of 1 total and then mapped back to where they go. If I remove the filter (first: 1) or increase it to (first: 24) or higher than my Task count, then I will get mostly the correct results. I could see this getting the wrong results though as some tasks eventually will have more than a single occurrence and all occurrances could belong to a single Task instead of evenly spaced out accordingly.

pawan · July 21, 2020, 6:52am

That’s a bit weird. @JatinDevDG can you try and reproduce this with a smaller data set which has 5 tasks and a couple of task occurrences for each task?

JatinDevDG · July 21, 2020, 6:53am

sure.

JatinDevDG · July 21, 2020, 9:38am

i don’t know how you added data i tried that using below schema and query and mutations and it woks , may be i am missing something .Please check.

1. Added below schema with inverse edge going from Task Occurrence to Task.

type Task {
  id: ID!
  name: String
  occurrences: [TaskOccurrence] @hasInverse(field: task)
}
type TaskOccurrence {
  id: ID!
  due: DateTime
  comp: DateTime
  task:Task!
}

2. Added multiple tasks with empty occurences.

mutation {
  addTask(input:[{
   name:"bug in server".
  }])
  {
  task{
      id
      name
      }
      
    }
  }

3.Finally added occurences of tasks using id’s of already added tasks.

mutation {
  addTaskOccurrence(input:[{
    due:"2006-01-13",
    comp: "2008-01-16"
    task:{id:"0x5"}
  }])
  {
    taskOccurrence{
      id
      due
      comp
      task{
        id 
        name
      }

4. This query working perfectly , i tried changing first to different values.

query
{
  queryTask(first:3){
    id
    name
    occurrences(first:2){
      id 
      due 
      comp
      
    }
  }
}

May be you are adding task occurences while adding task like below.

mutation {
  addTask(input:[{
   name:"question",
   occurrences:[{due:"2006-01-13",comp:"2008-01-17"}]
  }])
  {
  task{
      id
      name
    occurrences{
      id
      due
      comp
      
    }
      
    }
  }
}

For this it adds new task even you add same name in task and change only occurrence of it, and by this every task have atmost one occurence.

@pawan , please check and let me know if i missed something.

amaster507 · July 21, 2020, 11:03am

Let me work up an example small dataset amd reproduce this.

I gotta dig down into my 8 million rows of data to see why this is happening in my dataset but I can’t duplicate it in a smaller dataset.

amaster507 · July 21, 2020, 8:24pm

Ok, now I can duplicate it with a smaller data set.

Build this schema:

type Task @auth(
  query: { rule: "{$USERROLE: { eq: \"USER\"}}" }
) {
  id: ID!
  name: String!
  occurrences: [TaskOccurrence] @hasInverse(field: task)
}

type TaskOccurrence @auth(
  query: { rule: "query { queryTaskOccurrence { task { id } } }" }
) {
  id: ID!
  due: DateTime
  comp: DateTime
  task: Task @hasInverse(field: occurrences)
}
# Dgraph.Authorization {"VerificationKey":"super-secret","Header":"auth","Namespace":"https://mydomain.com/jwt/claims","Algo":"HS256"}

Here is a valid header to use:

{ "auth": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpYXQiOjE1MTYyMzkwMjIsImV4cCI6MTcyNjQzOTAyMiwiaHR0cHM6Ly9teWRvbWFpbi5jb20vand0L2NsYWltcyI6eyJVU0VSUk9MRSI6IlVTRVIifX0.x7lbPNSKU9SqsXyGuFjwRpYnNZs2t1Wfsb8MT70UBZY" }

Run this mutation to insert data:

mutation {
  addTask(input:[{
    name:"First Task four occrrences"
    occurrences: [{
      due:"2020-07-19T08:00:00"
      comp:"2020-07-19T08:04:32"
    },{
      due:"2020-07-20T08:00:00"
      comp:"2020-07-20T07:58:17"
    },{
      due:"2020-07-21T08:00:00"
    },{
      due:"2020-07-22T08:00:00"
    }]
  },{
    name:"Second Task single occurrence"
    occurrences: [{
      due:"2020-07-30T18:00:00"
    }]
  },{
    name:"Third Task no occurrence"
  },{
    name:"Fourth Task two occurrences"
    occurrences: [{
      due:"2020-07-01T12:00:00"
      comp:"2020-06-30T17:43:14"
    },{
      due:"2020-08-01T12:00:00"
    }]
  }]) {
    numUids
    task {
      id
      name
      occurrences {
        id
        due
        comp
      }
    }
  }
}

Run this query to see all of your data (same as seen when inserted):

query {
  queryTask {
    id
    name
    occurrences {
      id
      due
      comp
    }
  }
}

You can also see it reversed:

query {
  queryTaskOccurrence {
    id
    due
    comp
    task {
      id
      name
    }
  }
}

Now here is the bug:

{
  queryTask(first:4){
    id
    name
    occurrences(first:2){
      id
      due
      comp
    }
  }
}

Produced for me (will be different based upon uids):

{
  "data": {
    "queryTask": [
      {
        "id": "0x2a",
        "name": "Third Task no occurrence",
        "occurrences": []
      },
      {
        "id": "0x2d",
        "name": "First Task four occrrences",
        "occurrences": [
          {
            "id": "0x27",
            "due": "2020-07-19T08:00:00Z",
            "comp": "2020-07-19T08:04:32Z"
          },
          {
            "id": "0x28",
            "due": "2020-07-21T08:00:00Z",
            "comp": null
          }
        ]
      },
      {
        "id": "0x30",
        "name": "Second Task single occurrence",
        "occurrences": []
      },
      {
        "id": "0x31",
        "name": "Fourth Task two occurrences",
        "occurrences": []
      }
    ]
  },
  "extensions": {
    "touched_uids": 53
  }
}

Task "0x30" is missing its occurrence`
Task "0x31" is missing both of its occurrences

I found an indicator trying to reproduce this that may help lead to the bug. I believe it is in the @auth directive. Change the rule on TaskOccurrence to anything other than a query based rule and it will work.

type TaskOccurrence @auth(
  query: { rule: "{$USERROLE: { eq: \"USER\"}}" }
) ...

Will produce the correct results. And matter of fact, even if I take off the (first: 4) from the top level query, the original schema and rules will still produce the wrong results.

I even changed the auth rule to a very simple query rule that did not depend on any other linked uid but only itself like:

type TaskOccurrence @auth(
  query: { rule: "query { queryTaskOccurrence { id } }" }
) ...

and it still reproduces the error, FYI.

IMO, Here is what I think is going on behind the scenes:

The filters are being passed to the part of the Dgraph/Badger script that gets the available uids based upon auth rules. It makes sense to pass some of the other filter logic ahead but the first parameter should not be passed down the chain. When the auth query rule runs it gets the available TaskOccurrence's limited to the first X and then sends that back up the chain to get reattached as available uids in the graph. This limits to a total of X sub data for the entire graph instead of X nested under each level like it is without the auth query rule.

I have not looked in the source code to try to find this logic, and I really don’t have the time to do that. If this is the case though, then this could be an easy fix by stripping the first filter from being passed down to the auth query rule

Looks like I am right,

github.com

dgraph-io/dgraph/blob/master/graphql/resolve/query_rewriter.go#L435


			},
		}},
	}

	if mainQuery.Filter != nil {
		ft.Child = append(ft.Child, mainQuery.Filter)
	}

	mainQuery.Filter = ft

	return append(dgQuery, op), nil
}

func intersection(a, b []uint64) []uint64 {
	m := make(map[uint64]bool)
	var c []uint64

	for _, item := range a {
		m[item] = true
	}

pawan · July 22, 2020, 7:24am

Thanks for such a detailed response @amaster507. This would help us find the issue and fix it quickly. @JatinDevDG is going to have a look into this.

arijit · July 22, 2020, 8:12am

@amaster507 The example was really helpful. I am able to reproduce the issue. We will look into it and have a fix soon. I think the bug is mostly related to applying the first filter at an incorrect place like you mentioned above.

arijit · August 18, 2020, 8:04am

@amaster507 Raised a PR to fix this.
https://github.com/dgraph-io/dgraph/pull/6221

Topic		Replies	Views
Subgraph "first" returning empty set GraphQL kind:bug	2	732	October 14, 2020
Does order in a DQL query matter? Dgraph	13	905	February 14, 2023
RFC: Nested Filters in GraphQL Dev graphql , rfc	39	9348	March 31, 2024
Unconsistent @filter behaviour 1.2.0 Dgraph	7	505	February 1, 2020
Inability to do simple GraphQL query filtering on node / edges like other GraphQl BaaS'es offer? Dgraph Cloud / Slash GraphQL kind:question	18	3925	May 20, 2021

Unexpected output when adding `first` filter on nested graph

Related topics