Running in a Docker container on Mac OS 10.14.6. I’ve got 24 GB of RAM for Docker.
So I can’t share my data which makes replication difficult, but I’ve come across an issue with the K-shortest path query, where the ‘shortest’ path doesn’t contain the
to UID. It appears that somehow there is a path, and that the weight of the path is below the threshold, but the path itself is more ‘hops’ away than the depth allows.
0x68 is a highly connected node and this seems common to this issue. It’s like the weights for some of the edges are 0 and so the total weight of the path is lower than any other path, but the result doesn’t contain the
to node. Since the order of the paths aren’t guaranteed this result seems to happen maybe 20% of the time, but one would expect this result to always show up in the results as it’s weight is lower than the other paths. All of the other paths in the result have a weight of 4, but they do include the
to UID. The first image is the
bug, since I can only upload one image, the second case would also have a weight of 2, but it would only have 3 UIDs and the
from uid would be in the result.
This is the ‘normal’ result I get 80%+ of the time. Note when the ‘bug’ occurs, this path is NOT in the results.
Is it possible to give the structure of graph in someway without telling us the node values etc. ?
So Alice 0xae812 and Bob 0xad5e0 say are employed by the Organzation A 0xa. This should always be the shortest path, which is the second result. Sometimes when I run the query, I get another organization that Alice worked for, that is connected to Eve (for example) who is connected to another Organization B that presumably has a path to the target (Bob). But as you see the innermost uid is Organization B , not Bob. We are not altering weights so every edge should have a weight of 1, but in the bad result you see three edges and a
_weight_ of 2. Mutations on the graph are being done to makes sure that every instance of an organization is the same organization. As much as possible, we are merging nodes so that Alice and Bob should always be the same nodes in the graph. All edges have forward and reverse relationships so we can traverse the path.
I have reloaded my data with Dgraph 20.07.0, and still have the same behavior. Regardless of the specifics, should it even be possible to get a path that doesn’t have the
Agree that this seems to be a bug, which is weird since we have good test coverage around this piece of code. I have added the status:accepted tag and we will have someone investigate this further. CC: @LGalatin @vvbalaji
So FYI, it appears this result happens when I leave
maxweight unconstrained or set to a much higher number than the depth. I haven’t spent an exhaustive amount of time testing this, but it appears if I set
maxweight to anything reasonable for a depth of 4 that I don’t get un-expected results.
The following is an example of setting
7, NOTE: When testing with 6 or below I didn’t see this behavior. NOTE: the
weight returned for this path is still
2. Hopefully this helps you track this down.
The UIDs in the query have changed because I reloaded the data in v20.07.0, but the entities represented are the same in the graph as in my original report.
Based on the graph that you have described, I have created this:
_:alice <name> "alice" .
_:bob <name> "bob" .
_:eve <name> "eve" .
_:orgA <name> "orgA" .
_:orgB <name> "orgB" .
_:orgC <name> "orgC" .
_:alice <empBy> _:orgA .
_:bob <empBy> _:orgA .
_:alice <empBy> _:orgB .
_:eve <empBy> _:orgB .
_:eve <empBy> _:orgC .
and I am making this query
q0 as shortest(from: 0x2, to: 0x3, numpaths:4, depth:4,
0x2 is the uid of Alice and
0x3 is that of Bob. I couldn’t reproduce the issue. It would be really very helpful if you could let me know if I have missed something, or your graph is different than the one I have created.
I’ll see if I can generate something that’s closer to what we’ve got for a test, but we’ve got about 40,000 nodes in our small test graph. In general, our organization nodes have order the of hundred(s) of connections to employees.
This is an example of two of our approximately 2500 organizations with ~EMPLOYED_BY relationships.
Not to belabor the point…
EDIT: Maybe this is the source of the weight of 2? Even though the path returned might follow common employees back and forth between orgs, there would still only be two hops for the org on the right to the org on on the left.
Replicated with the following. Remember EMPLOYED_BY MUST be defined to have a reverse relationship:
NOTE: It usually takes several requests before the bugged result comes back, it generally doesn’t.
EDIT: This may take automated tests to find, as I can’t get that result again with the same data.
_:alice <Name> "alice" .
_:bob <Name> "bob" .
_:charles <Name> "charles" .
_:dan <Name> "dan" .
_:ed <Name> "ed" .
_:fran <Name> "fran" .
_:greg <Name> "greg" .
_:helen <Name> "helen" .
_:irene <Name> "irene" .
_:julia <Name> "julia" .
_:karen <Name> "karen" .
_:louis <Name> "louis" .
_:orgA <Name> "orgA" .
_:orgB <Name> "orgB" .
_:orgC <Name> "orgC" .
_:orgD <Name> "orgD" .
_:alice <EMPLOYED_BY> _:orgA .
_:bob <EMPLOYED_BY> _:orgA .
_:fran <EMPLOYED_BY> _:orgA .
_:greg <EMPLOYED_BY> _:orgA .
_:charles <EMPLOYED_BY> _:orgB .
_:dan <EMPLOYED_BY> _:orgB .
_:ed <EMPLOYED_BY> _:orgB .
_:fran <EMPLOYED_BY> _:orgB .
_:greg <EMPLOYED_BY> _:orgB .
_:helen <EMPLOYED_BY> _:orgB .
_:irene <EMPLOYED_BY> _:orgB .
_:karen <EMPLOYED_BY> _:orgB .
_:louis <EMPLOYED_BY> _:orgB .
_:charles <EMPLOYED_BY> _:orgC .
_:dan <EMPLOYED_BY> _:orgC .
_:ed <EMPLOYED_BY> _:orgC .
_:helen <EMPLOYED_BY> _:orgC .
_:irene <EMPLOYED_BY> _:orgC .
_:bob <EMPLOYED_BY> _:orgD .
_:louis <EMPLOYED_BY> _:orgD .
_:helen <EMPLOYED_BY> _:orgD .
_:irene <EMPLOYED_BY> _:orgD .
_:karen <EMPLOYED_BY> _:orgD .
Thanks for providing us with the data to replicate the issue. We are working to fix it, we’ll let you know once we get this fixed.
Yeah, I’m guessing this might be an issue where the more connected an org is the more likely this is to happen. Thank you all for your help!
For reference this was the result of the mutation to create that.
This issue comes up rarely, I wrote an script and was able to replicate it. The issue is apparently because we are messing up with some pointers while we use a memory from
sync.Pool to store current path. In this line we are extending the length of curPath if the requirement of its length is within its capacity. But somehow kroutes is getting modified by this operation (may be because of some overlap in addresses).
I am attaching thescript.py (3.0 KB) and the diff (4.3 KB) with changes in
shortest.go to log the details.
FWIW this comes up fairly regularly in our test graph. Maybe 30-40% of the time, but with the data set I provided it was pretty uncommon.
Yep. It depends on the data set. The more number of times it uses the memory from sync.pool, the more this issue is seen. So, larger dataset will see this issue more frequently.