Issues with k shortest path handling

Moved from GitHub dgraph/3504

Posted by campoy:

There are three different issues that could be improved on calls to shortest.

Variables should be accepted (#1243)

A quick example of why this should be accepted.

{
  sp as var (func: eq(name@en, "Steven Spielberg")) @filter(has(director.film))
  jg as var (func: eq(name@en, "Jeff Goldblum")) @filter(has(director.film))
    
  p as shortest(from: uid(sp), to: uid(jq)) {
     ...predicates...
  }
    
  path(func: uid(p)) {
    uid
    name@.
  }     	     
}

Currently we need to specify which predicates to consider

This is most probably due for performance issues, but it makes it really hard to find random connections from A to B.

For instance, consider the query above and try to guess which predicates should go where ...predicates....

After a while, and with @danielmai’s help, we ended up figuring out it was:

    director.film
    starring
    performance.actor

Multiple shortest queries overwrite the resulting path

Calls to shortest produce an output under _path_. Unfortunately, this means that if you have multiple calls to shortest the results will overwrite each other and only one will eventually be sent back in the response.

See this example:

{
  p as shortest(from: 0x22f1e0, to: 0x82ec65) {
    director.film
    starring
    performance.actor
  }

  q as shortest(from: 0x22f1e0, to: 0x5092b) {
    director.film
    starring
    performance.actor
  }
        
  path1(func: uid(p)) {
    uid
    name@.
  }    
 	     
  path2(func: uid(q)) {
    uid
    name@.
  }    
}

The output will contain a single _path_ corresponding to the last call to shortest.

{
  "extensions": {
    "server_latency": {
      "parsing_ns": 45165,
      "processing_ns": 114340665,
      "encoding_ns": 935863
    },
    "txn": {
      "start_ts": 22728120
    }
  },
  "data": {
    "path1": [
      {
        "uid": "0x22f1e0",
        "name@.": "Steven Spielberg"
      },
      {
        "uid": "0x22ca64",
        "name@.": "The Lost World: Jurassic Park"
      },
      {
        "uid": "0xf50c"
      },
      {
        "uid": "0x82ec65",
        "name@.": "Jeff Goldblum"
      }
    ],
    "path2": [
      {
        "uid": "0x22f1e0",
        "name@.": "Steven Spielberg"
      },
      {
        "uid": "0x51cd20",
        "name@.": "Bridge of Spies"
      },
      {
        "uid": "0x398f09"
      },
      {
        "uid": "0x5092b",
        "name@.": "Tom Hanks"
      }
    ],
    "_path_": [
      {
        "director.film": [
          {
            "starring": [
              {
                "performance.actor": [
                  {
                    "uid": "0x5092b"
                  }
                ],
                "uid": "0x398f09"
              }
            ],
            "uid": "0x51cd20"
          }
        ],
        "uid": "0x22f1e0"
      }
    ]
  }
}

campoy commented :

One more issue we found is how when using a variable to store the result of calling shortest, only the first path will be stored in that variable!

So for instance, if I want to find 2 the movies directed by Steven Spielberg where Jeff Goldblum appears as an actor I could write the following:

{
  p as shortest(from: 0x22f1e0, to: 0x82ec65, numpaths:2) {
    director.film
    starring
    performance.actor
  }

  path(func: uid(p)) {
    uid
    name@.
  }    
}

While the _path_ data element will show all the paths we found, the path data will only show the first one.

{
  "extensions": {
    "server_latency": {
      "parsing_ns": 18057,
      "processing_ns": 63379742,
      "encoding_ns": 733893
    },
    "txn": {
      "start_ts": 22790001
    }
  },
  "data": {
    "path": [
      {
        "uid": "0x22f1e0",
        "name@.": "Steven Spielberg"
      },
      {
        "uid": "0x414ee0",
        "name@.": "Jurassic Park"
      },
      {
        "uid": "0x14f22a"
      },
      {
        "uid": "0x82ec65",
        "name@.": "Jeff Goldblum"
      }
    ],
    "_path_": [
      {
        "director.film": [
          {
            "starring": [
              {
                "performance.actor": [
                  {
                    "uid": "0x82ec65"
                  }
                ],
                "uid": "0x14f22a"
              }
            ],
            "uid": "0x414ee0"
          }
        ],
        "uid": "0x22f1e0"
      },
      {
        "director.film": [
          {
            "starring": [
              {
                "performance.actor": [
                  {
                    "uid": "0x82ec65"
                  }
                ],
                "uid": "0xf50c"
              }
            ],
            "uid": "0x22ca64"
          }
        ],
        "uid": "0x22f1e0"
      }
    ]
  }
}

The second movie with UID 0x22ca64 is “The Lost World: Jurassic Park”