K Shortest Path query returns path that doesn't have the `to` node

daver208 · August 31, 2020, 2:22pm

Dgraph v20.03.4
Running in a Docker container on Mac OS 10.14.6. I’ve got 24 GB of RAM for Docker.

So I can’t share my data which makes replication difficult, but I’ve come across an issue with the K-shortest path query, where the ‘shortest’ path doesn’t contain the to UID. It appears that somehow there is a path, and that the weight of the path is below the threshold, but the path itself is more ‘hops’ away than the depth allows.

0x68 is a highly connected node and this seems common to this issue. It’s like the weights for some of the edges are 0 and so the total weight of the path is lower than any other path, but the result doesn’t contain the to node. Since the order of the paths aren’t guaranteed this result seems to happen maybe 20% of the time, but one would expect this result to always show up in the results as it’s weight is lower than the other paths. All of the other paths in the result have a weight of 4, but they do include the from and to UID. The first image is the bug, since I can only upload one image, the second case would also have a weight of 2, but it would only have 3 UIDs and the to and from uid would be in the result.

daver208 · August 31, 2020, 2:28pm

This is the ‘normal’ result I get 80%+ of the time. Note when the ‘bug’ occurs, this path is NOT in the results.

Anurag · August 31, 2020, 4:42pm

Is it possible to give the structure of graph in someway without telling us the node values etc. ?

daver208 · August 31, 2020, 6:30pm

So Alice 0xae812 and Bob 0xad5e0 say are employed by the Organzation A 0xa. This should always be the shortest path, which is the second result. Sometimes when I run the query, I get another organization that Alice worked for, that is connected to Eve (for example) who is connected to another Organization B that presumably has a path to the target (Bob). But as you see the innermost uid is Organization B , not Bob. We are not altering weights so every edge should have a weight of 1, but in the bad result you see three edges and a _weight_ of 2. Mutations on the graph are being done to makes sure that every instance of an organization is the same organization. As much as possible, we are merging nodes so that Alice and Bob should always be the same nodes in the graph. All edges have forward and reverse relationships so we can traverse the path.

I have reloaded my data with Dgraph 20.07.0, and still have the same behavior. Regardless of the specifics, should it even be possible to get a path that doesn’t have the to node.

Anurag · September 1, 2020, 1:38pm

Agree that this seems to be a bug, which is weird since we have good test coverage around this piece of code. I have added the status:accepted tag and we will have someone investigate this further. CC: @LGalatin @vvbalaji

daver208 · September 2, 2020, 12:22pm

So FYI, it appears this result happens when I leave maxweight unconstrained or set to a much higher number than the depth. I haven’t spent an exhaustive amount of time testing this, but it appears if I set maxweight to anything reasonable for a depth of 4 that I don’t get un-expected results.

The following is an example of setting maxweight to 7, NOTE: When testing with 6 or below I didn’t see this behavior. NOTE: the weight returned for this path is still 2. Hopefully this helps you track this down.

The UIDs in the query have changed because I reloaded the data in v20.07.0, but the entities represented are the same in the graph as in my original report.

ahsan · September 3, 2020, 1:41pm

@daver208,

Based on the graph that you have described, I have created this:

{
  set{
    _:alice <name> "alice" .
    _:bob <name> "bob" .
    _:eve <name> "eve" .
    _:orgA <name> "orgA" .
    _:orgB <name> "orgB" .
    _:orgC <name> "orgC" .
    
    _:alice <empBy> _:orgA .
    _:bob <empBy> _:orgA .
    
    _:alice <empBy> _:orgB .
    _:eve <empBy> _:orgB .
    
    _:eve <empBy> _:orgC .
    
  }
}

and I am making this query

{
  q0 as shortest(from: 0x2, to: 0x3, numpaths:4, depth:4,
  minweight:1){
    empBy ~empBy
  }
    path(func: uid(q0)){
      uid 
    }
}

0x2 is the uid of Alice and 0x3 is that of Bob. I couldn’t reproduce the issue. It would be really very helpful if you could let me know if I have missed something, or your graph is different than the one I have created.

Thanks

daver208 · September 3, 2020, 2:32pm

I’ll see if I can generate something that’s closer to what we’ve got for a test, but we’ve got about 40,000 nodes in our small test graph. In general, our organization nodes have order the of hundred(s) of connections to employees.

This is an example of two of our approximately 2500 organizations with ~EMPLOYED_BY relationships.

daver208 · September 3, 2020, 2:53pm

Not to belabor the point…

EDIT: Maybe this is the source of the weight of 2? Even though the path returned might follow common employees back and forth between orgs, there would still only be two hops for the org on the right to the org on on the left.

daver208 · September 3, 2020, 5:33pm

@ahsan
Replicated with the following. Remember EMPLOYED_BY MUST be defined to have a reverse relationship:

NOTE: It usually takes several requests before the bugged result comes back, it generally doesn’t.

EDIT: This may take automated tests to find, as I can’t get that result again with the same data.

{
  set{
    _:alice <Name> "alice" .
    _:bob <Name> "bob" .
    _:charles <Name> "charles" .
    _:dan <Name> "dan" .
    _:ed <Name> "ed" .
    _:fran <Name> "fran" .
    _:greg <Name> "greg" .
    _:helen <Name> "helen" .
    _:irene <Name> "irene" .
    _:julia <Name> "julia" .
    _:karen <Name> "karen" .
    _:louis <Name> "louis" .

    _:orgA <Name> "orgA" .
    _:orgB <Name> "orgB" .
    _:orgC <Name> "orgC" .
    _:orgD <Name> "orgD" .

    _:alice <EMPLOYED_BY> _:orgA .
    _:bob <EMPLOYED_BY> _:orgA .
    _:fran <EMPLOYED_BY> _:orgA .
    _:greg <EMPLOYED_BY> _:orgA .


    _:charles <EMPLOYED_BY> _:orgB .
    _:dan <EMPLOYED_BY> _:orgB .
    _:ed <EMPLOYED_BY> _:orgB .
    _:fran <EMPLOYED_BY> _:orgB .
    _:greg <EMPLOYED_BY> _:orgB .
    _:helen <EMPLOYED_BY> _:orgB .
    _:irene <EMPLOYED_BY> _:orgB  .
    _:karen <EMPLOYED_BY> _:orgB .
    _:louis <EMPLOYED_BY> _:orgB .

    _:charles <EMPLOYED_BY> _:orgC .
    _:dan <EMPLOYED_BY> _:orgC .
    _:ed <EMPLOYED_BY> _:orgC .
    _:helen <EMPLOYED_BY> _:orgC .
    _:irene <EMPLOYED_BY> _:orgC  .

    _:bob <EMPLOYED_BY> _:orgD .
    _:louis <EMPLOYED_BY> _:orgD .
    _:helen <EMPLOYED_BY> _:orgD .
    _:irene <EMPLOYED_BY> _:orgD  .
    _:karen <EMPLOYED_BY> _:orgD .
  }
}

ahsan · September 3, 2020, 6:00pm

Thanks for providing us with the data to replicate the issue. We are working to fix it, we’ll let you know once we get this fixed.

daver208 · September 3, 2020, 6:02pm

Yeah, I’m guessing this might be an issue where the more connected an org is the more likely this is to happen. Thank you all for your help!

For reference this was the result of the mutation to create that.

    "uids": {
      "alice": "0x77",
      "bob": "0x73",
      "charles": "0x7b",
      "dan": "0x70",
      "ed": "0x7c",
      "fran": "0x7d",
      "greg": "0x78",
      "helen": "0x79",
      "irene": "0x74",
      "julia": "0x7e",
      "karen": "0x71",
      "louis": "0x75",
      "orgA": "0x72",
      "orgB": "0x7a",
      "orgC": "0x76",
      "orgD": "0x7f"
    }

ahsan · September 8, 2020, 6:26am

This issue comes up rarely, I wrote an script and was able to replicate it. The issue is apparently because we are messing up with some pointers while we use a memory from sync.Pool to store current path. In this line we are extending the length of curPath if the requirement of its length is within its capacity. But somehow kroutes is getting modified by this operation (may be because of some overlap in addresses).

I am attaching thescript.py (3.0 KB) and the diff (4.3 KB) with changes in shortest.go to log the details.

cc @Anurag

daver208 · September 8, 2020, 1:50pm

FWIW this comes up fairly regularly in our test graph. Maybe 30-40% of the time, but with the data set I provided it was pretty uncommon.

ahsan · September 8, 2020, 3:08pm

Yep. It depends on the data set. The more number of times it uses the memory from sync.pool, the more this issue is seen. So, larger dataset will see this issue more frequently.

ahsan · September 10, 2020, 11:36am

@daver208 The issue is fixed by this PR: fix(Query): Fix wrong path response for k-shortest paths by ahsanbarkati · Pull Request #6437 · dgraph-io/dgraph · GitHub

Thanks again for reporting this and helping us to replicate this issue.

Topic		Replies	Views
Shortest path query problem Dgraph kind:question	5	1219	October 12, 2020
Finding path to node Users	3	736	January 26, 2020
Performance is not good when finding the paths between 2 nodes with k-shortest Dgraph	5	842	December 2, 2018
K-shortest path query becomes slow with k>3 Dgraph	0	504	October 31, 2018
Summing up weights of k shortest paths Dgraph	1	462	July 20, 2020

K Shortest Path query returns path that doesn't have the `to` node

Related topics