Shortest path does not respect `first` ... sometimes

Moved from GitHub dgraph/3663

Posted by campoy:

If you suspect this could be a bug, follow the template.

  • What version of Dgraph are you using?

play.dgraph.io

  • Have you tried reproducing the issue with latest release?

Yes, it still fails

  • What is the hardware spec (RAM, OS)?

MacOS but also play.dgraph.io, so probably irrelevant

  • Steps to reproduce the issue (command/config used to run Dgraph).

Run a shortest path query and use first or offset to get a specific part of the path.

For instance, in order to find movies directed by Michael Bay where Megan Fox appears,
we can run this query on play.dgraph.io:

{
  path as shortest(from: 0x5526c6, to: 0x652d3f, depth: 2) {
    <director.film>
    <actor.film>
    <starring>
    <performance.actor>
  }
  
  names(func: uid(path)) {
    uid
    name@.
  }
}

This wils return one of the movies matching the description:

{
  "extensions": {...},
  "data": {
    "names": [
      {
        "uid": "0x5526c6",
        "name@.": "Michael Bay"
      },
      {
        "uid": "0x40b2f3",
        "name@.": "Transformers: Revenge of the Fallen"
      },
      {
        "uid": "0x4d6d44"
      },
      {
        "uid": "0x652d3f",
        "name@.": "Megan Fox"
      }
    ],
    "_path_": [
      {
        "director.film": [
          {
            "starring": [
              {
                "performance.actor": [
                  {
                    "uid": "0x652d3f"
                  }
                ],
                "uid": "0x4d6d44"
              }
            ],
            "uid": "0x40b2f3"
          }
        ],
        "uid": "0x5526c6"
      }
    ]
  }
}

Now, if I want to get only the name of the movie, I know that this is the second element of the path, so first:1, offset: 1 should work, but it doesn’t.

Depending on the combination you might get different results.

Only offset

If I set offset: 1 I expect to see the name of the movie first (we skipped the director) followed by an anonymous node (the performance) and finally Megan Fox.

{
  path as shortest(from: 0x5526c6, to: 0x652d3f, depth: 2) {
    <director.film>
    <actor.film>
    <starring>
    <performance.actor>
  }
  
  names(func: uid(path),  offset: 1) {
    uid
    name@.
  }
}

Randomly, I get either the name of the movie and the performance but no Megan Fox:

[...]
  "data": {
    "names": [
      {
        "uid": "0x7111ab",
        "name@.": "Bad Boys II"
      },
      {
        "uid": "0x733703"
      }
    ],
    "_path_": [
      {
        "director.film": [
          {
            "starring": [
              {
                "performance.actor": [
                  {
                    "uid": "0x652d3f"
                  }
                ],
                "uid": "0x733703"
              }
            ],
            "uid": "0x7111ab"
          }
        ],
        "uid": "0x5526c6"
      }
    ]
  }
}

But also, sometimes, I’ll get just Megan Fox

[...]
  "data": {
    "names": [
      {
        "uid": "0x652d3f",
        "name@.": "Megan Fox"
      }
    ],
    "_path_": [
      {
        "director.film": [
          {
            "starring": [
              {
                "performance.actor": [
                  {
                    "uid": "0x652d3f"
                  }
                ],
                "uid": "0x466767"
              }
            ],
            "uid": "0x1053f0"
          }
        ],
        "uid": "0x5526c6"
      }
    ]
  }
}

Only first

If I set the request to use first: 1 the query works correctly and only returns Michael Bay.

But … if I set it to two, randomness comes back and sometimes I’ll get the right result:

{
  path as shortest(from: 0x5526c6, to: 0x652d3f, depth: 2) {
    <director.film>
    <actor.film>
    <starring>
    <performance.actor>
  }
  
  names(func: uid(path),  first: 2) {
    uid
    name@.
  }
}
[...]
  "data": {
    "names": [
      {
        "uid": "0x5526c6",
        "name@.": "Michael Bay"
      },
      {
        "uid": "0x7111ab",
        "name@.": "Bad Boys II"
      }
    ],
    "_path_": [
      {
        "director.film": [
          {
            "starring": [
              {
                "performance.actor": [
                  {
                    "uid": "0x652d3f"
                  }
                ],
                "uid": "0x733703"
              }
            ],
            "uid": "0x7111ab"
          }
        ],
        "uid": "0x5526c6"
      }
    ]
  }
}

And sometimes I’ll get Michael Bay and Megan Fox, skipping both the movie and the performance!

[...]
  "data": {
    "names": [
      {
        "uid": "0x5526c6",
        "name@.": "Michael Bay"
      },
      {
        "uid": "0x652d3f",
        "name@.": "Megan Fox"
      }
    ],
    "_path_": [
      {
        "director.film": [
          {
            "starring": [
              {
                "performance.actor": [
                  {
                    "uid": "0x652d3f"
                  }
                ],
                "uid": "0x466767"
              }
            ],
            "uid": "0x1053f0"
          }
        ],
        "uid": "0x5526c6"
      }
    ]
  }
}

Lastly, just for fun let’s use first and offset together!

The query I wanted to perform to get the second element was first: 1, offset: 1.

{
  path as shortest(from: 0x5526c6, to: 0x652d3f, depth: 2) {
    <director.film>
    <actor.film>
    <starring>
    <performance.actor>
  }
  
  names(func: uid(path), offset: 1, first: 1) {
    uid
    name@.
  }
}

This will again sometimes work:

[...]
  "data": {
    "names": [
      {
        "uid": "0x7111ab",
        "name@.": "Bad Boys II"
      }
    ],
    "_path_": [
      {
        "director.film": [
          {
            "starring": [
              {
                "performance.actor": [
                  {
                    "uid": "0x652d3f"
                  }
                ],
                "uid": "0x733703"
              }
            ],
            "uid": "0x7111ab"
          }
        ],
        "uid": "0x5526c6"
      }
    ]
  }
}

But sometimes it won’t:

[...]
  "data": {
    "names": [
      {
        "uid": "0x652d3f",
        "name@.": "Megan Fox"
      }
    ],
    "_path_": [
      {
        "director.film": [
          {
            "starring": [
              {
                "performance.actor": [
                  {
                    "uid": "0x652d3f"
                  }
                ],
                "uid": "0x4d6d44"
              }
            ],
            "uid": "0x40b2f3"
          }
        ],
        "uid": "0x5526c6"
      }
    ]
  }
}

pawanrawal commented :

I had a deeper look into this and the problem here is that we don’t preserve the order of UIDs in the uidMatrix when populating UIDs from a variable that was the result of the shortest path query. We sort the list of UIDs before doing pagination. In general all uid variables are supposed to contain sorted lists whereas in the case of the shortest path, they are not guaranteed to and the order matters. We’ll have to add a way to find out the cases in which we should preserve the order.

This is the step where the order of elements in the uidMatrix is changed

campoy commented :

No need to work on this right now, but it’s good to keep track of what we need to improve regarding k-shortest paths.

manishrjain commented :

Seems tricky. We’d need to make variables switch from Go maps to some sorted map implementation, which would be slower.

campoy commented :

I do understand this is not easy to fix, but something needs to be done.

Maybe we should simply disable the usage of first and other pagination on variables coming from shortest path?
Or maybe the variables coming from shortest path should simply be different?