Inconsistent Handling of Value Variables

According to Value Variable Doc:

… It therefore only makes sense to use the values from a value variable in a context that matches the same UIDs - if used in a block matching different UIDs the value variable is undefined.

However, in one case where the enclosing block matches a subset of the UIDs, I am able to use val to extract value from value vars. In another case, it just gives “Division by zero” error.

The following are mutation, schema and queries I used to trigger the inconsistency:

Mutation:

{
  set {
    _:m1 <name> "Alice Movie" .
    _:m2 <name> "Movie II" .
    _:m3 <name> "Movie 3" .
    _:m1 <genre> _:g1 .
    _:m1 <genre> _:g2 .
    _:m2 <genre> _:g2 .
    _:m3 <genre> _:g1 .
    _:m1 <style> _:s1 .
    _:m2 <style> _:s1 .
    _:g1 <name> "Genre1" .
    _:g2 <name> "Genre2" .
    _:s1 <name> "Style1" .
  }
}

Schema:

genre: uid @reverse .
style: uid @reverse .
name: string @index(hash) . 

Queries:

query{
  var(func: eq(name, "Alice Movie")) {
    n_genres as count(genre)
    n_styles as count(style)
  }
  
  dist(func: eq(name, "Alice Movie")) @normalize {
    m1_name: name
    m1_g as genre {
      ~genre {
        n_common_genres as count(genre @filter(uid(m1_g)))
      }
    }
      
    m1_s as style {
      ~style {
        n_common_styles as count(style @filter(uid(m1_s)))
      }
    }
  	
    n_genres: val(n_genres)  # 2
    n_styles: val(n_styles)  # 1
    
    # subset of UIDs for the following two
    n_common_genres: val(n_common_genres)  # 2
    n_common_styles: val(n_common_styles)  # 1
    
    # all four variables were extracted by val()
    # but the folowing gives "Division by zero" error
    # score as math(
    #   (n_common_genres + n_common_styles) /
    #   (n_genres + n_styles)
    # )
    # score: val(score)
  }  
	
  m2(func: eq(name, "Movie II")) {
    name
    val(n_common_genres)  # 1
    val(n_common_styles)  # 1
    
    # I am trying to get 'val(score)' here
  }
  
	
  m3(func: eq(name, "Movie 3")) {
    name
    val(n_common_genres)  # 1
    val(n_common_styles)  # no output

    # I am trying to get 'val(score)' here
  }
}

Dgraph version: v1.0.11

It would be great if dgraph can handle those cases consistently, or at least gives errors more relevant than ‘Division by Zero’.


A related question regarding the above queries: is there a way in GraphQL+- to calculate Jaccard distance using multiple UID predicates? That is what I was trying to do with the commented lines, a similar formula to the Jaccard distance.


Even nicer, can dgraph properly allow value_var as count(...) to be used in block matching a subset of original UIDs?

Did you tried the ordering of the values?
In math the position is important (especially division and multiplication).

But in this case I believe that was that the “n_genres” and “n_styles”
was computed after, not sure (but makes sense), and their values were not retained
when this (“dist”) block was executed. You need to order by the computed ordering.

(...)
 		score as math(
      (n_genres + n_styles)/(n_common_genres + n_common_styles) #"3/3 = 1"
    ) 
    score: val(score)
(...)
{    "dist": [
      {
        "m1_name": "Alice Movie",
        "n_genres": 2,
        "n_styles": 1,
        "n_common_genres": 2,
        "n_common_styles": 1,
        "score": 1 
      }
}

All you need to know about Vars propagation is in Get started with Dgraph.

About the question “Jaccard distance” I’ll ask more details about it.

Thanks for your reply!

I was not too sure about what you meant by ‘not retained’. If the values were not retained, how could I get the values with the val() function? But not with the math() function?

As for the fact that math() is able to compute the inverse of the fraction, I guess you found another case where the math() function is able to extract values. But it still doesn’t explain why math((a+b)/(c+d)) can give ‘Division by Zero’ error where a/b/c/d/ are all positive numbers.

As an attempt to “order by the computed ordering”, I’ve just also tried swapping the declaration of n_common_genres and n_genre, but it sill gave the same error.

...
  var(func: eq(name, "Alice Movie")) {
    m1_g as genre {
      ~genre {
        n_common_genres as count(genre @filter(uid(m1_g)))
      }
    }
      
    m1_s as style {
      ~style {
        n_common_styles as count(style @filter(uid(m1_s)))
      }
    }
  }
  
  dist(func: eq(name, "Alice Movie")) @normalize {
    m1_name: name
    n_genres as count(genre)
    n_styles as count(style)
  	
    n_genres: val(n_genres)
    n_styles: val(n_styles)
    n_common_genres: val(n_common_genres)
    n_common_styles: val(n_common_styles)
    
    ### the folowing gives "Division by zero" error ###
    # score as math(
    #   (n_common_genres + n_common_styles) /
    #   (n_genres + n_styles)
    # )
    # score: val(score)
  }  
...

If I am understanding Variable Propagation correctly, it is similar to this case but not exactly the same.

I figured that typical var propagation happens when we declare the var in a parent block and extract value of the var in a nested block.

But what I was trying to achieve is the opposite: declare var inside the nested block and extract value in the parent block (or, more generally, any block that only matches a subset of the nested block’s UIDs). Does the rule of variable propagation still apply here?

First, read this design concept Get started with Dgraph
It’s a bit out of date, but can work.

Put in mind that each query block runs concurrently. And each block of a query block can do more network calls at different time. And “count” can be computed in a different time tho.

This could be a bug, but needs to be proven. For “math” should wait everything to be computed.
@gus can you check this out?

Well, I may be completely wrong here. Well I’m guessing how these functionalities (val, count and math) works (in terms of execution (who performs first) ). I did not write them. So I’m assuming its behavior cuz I found that behavior and solved the case (?).

Do this test:

Run this queries.

{
  test1(func: uid(0x1)){
     score as math(0/2)
    score: val(score) #Will be 0 - In regular math you can divide zero by N number.
  }
}
{
  test2(func: uid(0x1)){
     score as math(1/0) # or ( 1/ 0 + 0 + 0 ) or ( (1 + 1) / (0 + 0))
    score: val(score) #You will get the error - 
       #Cuz in regular math you can't divide a number per zero.
  }
}

So, this tests above shows that you’ll get this error only if the second value is 0 (or values).
That means in your first Query. The “n_genres” and “n_styles” are empty.

I’ve said in my last comment that was “n_common_genres” and “n_common_styles”. In fact I’ve confused, in fact they were “n_genres” and “n_styles” in that statement.

Anyway. My last reply don’t solved the case?
did you find any other issue related to swapping vars position in math( )?

Cheers.

1 Like

I am still looking at this but to start, I will clarify what happens inside math blocks.

Inside math blocks vars are not being treated as constants but rather as maps of a uid to a value. So to evaluate a math expression, dgraph loops over the maps, gets all the values of each variable in the expression for a particular uid, and then performs the math. If one of the var maps does not have a value for a given uid, zero is used instead. This causes division to not work if the denominator evaluates to zero.

I think this is what’s going on:

  • n_genres and n_styles are only defined for one uid (the ID of the blank node _:m1.
  • n_common_genres and n_common_styles are defined for the ID’s of the blank nodes _:g1, _:g2, and _:s1 (because the query takes the reverse edges of genre and style and the variables are defined inside those blocks).

So the math expression cannot be computed the way you want because they are not defined for the same set of uids. To fix this you would either have to rewrite the query in a way that allows n_common_genres and n_common_styles to be defined for node _:m1 (not sure if that’s possible) or only get the variables in the query and perform the math in your application after you get the response. Dgraph can only perform match variable values as long as the uids are the same.

2 Likes

Thanks for the clarification!

There was just one point that I didn’t quite understand, and I wondered if you could explain a bit more:

And my questions are commented below in the initial query:

Or did I completely get both 1) and 2) wrong?

Thanks!