Multiple Variable Consistency

BlakeMScurr · November 6, 2017, 5:07am

Hi all

I have an interesting case that I can’t figure out how to express in GraphQL+/-.

Let’s suppose we have the family trees of several inventors, and every person has a list of inventions, and dislikes. I would like to find everyone who dislikes something invented by their ancestors.

Here’s some example data:

_:1 <name> "Otto" .
_:1 <occupation> "Inventor" .
_:1 <invented> "Sliced Bread" .
_:1 <child> _:2 .
_:1 <child> _:3 .

_:2 <name> "Patrick" .
_:2 <dislikes> "Sliced Bread" .
_:2 <dislikes> "Sliced Bread" .

_:3 <name> "Quentin" .

_:4 <name> "John" .
_:4 <occupation> "Inventor" .
_:4 <invented> "Ballpoint Pen" .
_:4 <child> _:5 .

_:5 <name> "Keith" .
_:5 <child> _:6 .
_:5 <dislikes> "Sliced Bread" .

_:6 <name> "Louis" .
_:6 <dislikes> "Ballpoint Pen" .

mutation {
  schema {
    occupation: string @index(exact) .
    dislikes: string @index(exact) .
  }
}

My approach was to find store the inventors and their inventions as separate variables then filter their descendants on whether they dislike their inventions.

{
  inventors as inv(func: eq(occupation, "Inventor")) {
    inventions as invented
  }

  recurse(func: uid(inventors)) {
    descendants as child
  }

  haters(func: uid(descendants)) {
    @filter(eq(dislikes, val(inventions)))
    expand(_all_)
  }
}

Here are the results:

"haters": [
  {
    "name": "Patrick",
    "dislikes": "Sliced Bread"
  },
  {
    "name": "Keith",
    "dislikes": "Sliced Bread"
  },
  {
    "name": "Louis",
    "dislikes": "Ballpoint Pen"
  }
]

The problem is that we get the false positive Keith. The issue, I think, is that the inventions variable holds the values {"Sliced Bread", "Ballpoint Pen"}, and compares the nodes in descendants who are children of John to "Sliced Bread", even though Sliced Bread was not invented by John.

Is it possible get consistency between descendants and inventions?

Thanks!

pawan · November 6, 2017, 6:53am

Hi @BlakeMScurr

That’s right, inventions has both the values and we just compare the dislikes to these values.

You would have to do this final check in your application. Basically, fetch all the haters and their ancestors (you would need a reverse index on child).

  recurse(func: uid(descendants)) @filter(eq(dislikes, val(inventions))) {
    dislikes
    ~child
    inventions
  }

Then you walk down the graph and check that the dislike for every hater should have been invented by some ancestor.

BlakeMScurr · November 6, 2017, 9:15pm

Great, thanks!

Would you mind showing me how to write this? It seems much like what I tried to write, and I think the devil is in the detail on this one.

I basically don’t know how to compare a descendant to its ancestor without also comparing it to non-ancestors that are stored in the same variable (because they’re ancestors of other nodes).

BlakeMScurr · November 6, 2017, 10:30pm

Ah I misread your comment sorry. I see that you mean I should do the final check in my application outside the query.

Thanks for your help

BlakeMScurr · November 7, 2017, 5:02am

Hi @pawan

I tried to implementing your suggestion which has ended up being a bit of extra logic in my application which I’ll post tomorrow when I polish it off. The final query I ended up using was:

{
  inventors as inv(func: eq(occupation, "Inventor")) {
    inventions as invented
  }

  recurse(func: uid(inventors)) {
    descendants as child
  }

  recurse(func: uid(descendants)) @filter(eq(dislikes, val(inventions))) {
    dislikes
    ~child
    name
  }
}

And there’s just a tiny annoyance in the results from the second recurse block. Each result doesn’t show all its ancestors, for example, L (Louis) has parent K (Keith) who has parent J (John). But the result for L only shows the direct parent K, J is already shown in the L result:

{
      {
        "dislikes": "Sliced Bread",
        "~child": [
          {
            "name": "John"
          }
        ],
        "name": "Keith"
      },
      {
        "dislikes": "Ballpoint Pen",
        "~child": [
          {
            "dislikes": "Sliced Bread",
            "name": "Keith"
          }
        ],
        "name": "Louis"
      }

It’s nice not to have repeated information, but it would be nice if I could add some directive to add it back in and make all results fully explicit, is that possible?

pawan · November 7, 2017, 8:09am

This seems like a bug. Recurse should ideally give you all the ancestors. I will investigate this tomorrow and create an issue if required.

BlakeMScurr · November 8, 2017, 4:54am

OK thanks! Can you link the issue here if it is a bug?

If we assume that gets solved then this post processing logic should give me the right values:

func main () {
	// using 0.8.3 client api
	c := newClient()

	req := client.Req{}
	req.SetQuery(myQueryAsShownAbove)
	resp, err := c.Run(context.Background(), &req)
	if err != nil {
		fmt.Fprintln(os.Stderr, err)
		return
	}

	var familyTree FamilyTreeResponse
	client.Unmarshal(resp.N, &familyTree)

	haters := []Person{}
	for _, descendant := range familyTree.Descendants {
		ancestor := descendant.Parent
		for ancestor != nil {
			if ancestor.Invented == descendant.Dislike {
				haters = append(haters, descendant)
				break
			}

			if ancestor.Parent == nil && 
			ancestor = ancestor.Parent
		}
	}

	fmt.Printf("%+v\n", haters)
}

type Person struct {
	Dislike  string  `json:"dislikes"`
	Name     string  `json:"name"`
	Parent   *Person `json:"~child"`
	Invented string  `json:"invented"`
	Uid      uint64  `json:"_uid_"`
}

type FamilyTreeResponse struct {
	Descendants []Person `json:"recurse"`
}

Which really isn’t anything to complain about. The trouble is that this is just one member of a class of queries that we want to run.

I’ll write a comment explaining the generalisation tomorrow

pawan · November 9, 2017, 6:27am

Here you go Recurse query doesn't return all nodes. · Issue #1753 · dgraph-io/dgraph · GitHub.

Looking forward to this.

BlakeMScurr · November 9, 2017, 10:43pm

OK, so it actually needs to generalize in a several of ways (I hope this post is concise enough ).

Broadly speaking, it seems possible it seems possible to generalise the postprocessing solution, but it becomes cumbersome, and may be inefficient.

Relationship Complexity

The original query compares an ancestors to a descendant, let’s call them A and B. A and B have a simple recursive child relationship. I’ll express this as though it were one GraphQL+/- block for brevity, even though we know it would have to be divided up into multiple blocks:

// Mock graphql
A {
    invented == $inv
    recurse child B {
        dislikes == $inv
    }
}

In the general case A and B could have an arbitrarily complex compound relationship. For a simple example, you may be interested inventors who have a descendant called Bob, where Bob has a descendant who dislikes the invention:

// Mock Gql
A {
    invented == $inv
    recurse child B {
        name == "bob"
        recurse child C {
            dislikes
        }
    }
}

If we use the same approach as the original solution we need a query like:

{
  inventors as inv(func: eq(occupation, "Inventor")) {
    inventions as invented
  }

  recurse(func: uid(inventors)) {
    descendants as child
  }

  bob(func: uid(descendants)) {
    @filter(eq(name, "Bob"))
    bobDescendants as child
  }

  dis(func: uid(bobDescendants)) {
    @filter(dislikes, val(inventions))
  }
}

I don’t think we can do a final block that links the dislikers to the inventors like in the basic case. However, if we return the uids of all the intermediary nodes there will be enough information to do the post processing even if it is more complex.

The problem is that since the relationship between A and B is any arbitrary relationship that can be expressed in GraphQL+/-, the postprocessing gets arbitrarly complex, and would be better suited to being part of GraphQL+/- itself.

Multiple Use Consistency

Perhaps you are interested in family lines where the sons have done the same job for 4 generations:

// Mock Gql
A {
    gender == male
    job == $j
    recurse child B {
        gender == male
        job == $j
        recurse child C {
            gender == male
            job == $j
            recurse child D {
                gender == male
                job == $j
            }
        }
    }
}

This will complicate post processing as we have to traverse multiple result sets and check their values.

Cross Branch Matching

The previous examples have been linear. What if we wanted to find someone with name X, whose first child has a descendant of name X, and whose second child has a descendant of name X.

// Mock Gql
A {
    name == $X
    firstChild B {
        recurse child C {
            name == $X
        }
    }
    secondChild D {
        recurse child E {
            name == $X
        }
    }
}

The actual query in this particular case is relatively simple:

{
  ancestor as anc(func: eq(is, "Person")) {
    X as name
    fc as firstChild
    sc as secondChild
  }

  recurse(func: uid(fc)) {
    fDescendants as child
  }

  recurse(func: uid(sc)) {
    sDescendants as child
  }

   recurse(func: uid(fDescendants)) @filter(eq(name, val(X))) {
    ~child
    name
  }

   recurse(func: uid(sDescendants)) @filter(eq(name, val(X))) {
    ~child
    name
  }
}

This is similar to the original solution, except we’re traversing up and down a tree. In post processing you would have to compare the value name between the ancestor and the two children. And you can imagine a more complex tree where you would compare children at each node in the tree.

Multiple Consistency

You may want to find descendants with the same name and job as their ancestor. Or, you may want someone with a descendant with the same name, and another descendant with the same job:

{
  ancestor as anc(func: eq(is, "Person")) {
    X as name
    J as job
  }

  recurse(func: uid(ancestor)) {
    descendant as child
  }


   recurse(func: uid(descendants)) @filter(eq(name, val(X))) {
    ~child
    name
  }

   recurse(func: uid(descendants)) @filter(eq(job, val(J))) {
    ~child
    name
  }
}

Alternative Solution

Rather than doing the post processing thing, we could do an initial query to find the possible values we’re dealing with, and run a query for each possible value. This would be expensive, but possibly no more expensive (N+1 queries for N possible values) than the post processing solution.

pawan · November 15, 2017, 3:55am

Hey @BlakeMScurr

Sorry, we were busy with releasing v0.9.0. I am going to have a look at your requirement this week and see how can we generalize it and do it within Dgraph.

BlakeMScurr · November 21, 2017, 4:27am

@pawan Awesome thanks for considering it. Can you keep me informed about this? I am happy to help, as this is an important feature for us.

system · December 21, 2017, 4:28am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Family tree schema hasInverse issues Dgraph Cloud	3	722	March 2, 2021
Mutation fails because of error Some variables are defined twice GraphQL status:accepted , kind:bug , ticket:created	2	744	March 11, 2021
Compare predicates from two different nodes Users	5	1341	June 3, 2018
How to do proper variable propagation? Dgraph	2	371	June 13, 2021
Upset using variable -DQL Issues	5	352	October 7, 2021