I have a particularly difficult issue that I wonder if DGraph can solve easily. I understand that it supports lookup on a path between two nodes (How to find relationship between two nodes?). But I’m interested in knowing if there is a path between one particular node and a GROUP of nodes. This group will be selected based on a filter, which could be part of the query or done before the query.
Is this something the database supports, or am I forced to run multiple node-to-node queries?
I understand that this is a very hard problem to solve, and requires much power if it’s purely algorithmic in nature. I’m hoping that DGraph has some modifications done to speed up queries like this. Even if it doesn’t do you have any suggestions on ways to store or link the data to support quicker query speed for tasks like this?
Hi @johan-l Is it possible for you to link group of nodes to another dummy node which you create in the data? Then you would be able to run shortest path queries out-of-the box from node-1 to dummy-node. The path would exist only via the group of nodes which you are interested in.
I’ve had that as a possible solution, although it would only solve part of the issue unless I pollute the graph with tons of these nodes. And at that point I’m not sure a graph database is the correct way to go.
Imagine my graph is a collection of package shipments, with each node being either a processing or position of the package. At any point the package can be split up and repackaged (processed) and branch out to multiple paths. Then I basically want to ask “has this package or parts of it been delivered to any of the stations in Paris, if so which ones?”
Doing the group thing makes sense, since I would be able to have a graph per city to quickly find out an estimate. But I would still have to run a query to find out which places in particular. If I’m thinking correctly.
I tested below snippet where P2 expands to more than one node, but it throws variable(P2) should only expand to 1 uid therefore the path to a group of nodes is not supported currently.
path as shortest(from: uid(P1), to: uid(P2)) {
friend
}
You would not have to run more queries to find which places, because the places will be in your path. You would need to parse the path extracted to get the relevant stations to which your package has been delivered.
For example if your dummy node is 0x17 then it would have two paths to it via 0x13 and 0x15`.
Sounds great, but if I understood it correctly it would still just map out a single (shortest) path between two nodes since a group of nodes in P2 isn’t supported yet.
I really need to have more than one “end node” since the packages in question can be split up along the way (imagine a shipment of 1000 screws that gets packaged into batches of 10 and resent to multiple customers). Your example is great, but I would have to either run it multiple times for each end node, or somehow abstract the path into something that’s easier to query.
Unless DGraph supports another query style against multiple nodes? For my example I wouldn’t need any PATH in return, a simple “yes/no” would be enough for each node in P2.
Looking forward to your input. Thanks again for the swift response time!
Suppose your group of nodes have nodes A-H. If you create a dummy node and link it to each node in P2 and do a query
path as shortest(from: uid(src_node), to: uid(dummy_node), numpaths: 8 ) {
link
}
It will return 4 paths (src - A - dummy_node; src - C - dummy_node; src - E - dummy_node; src - F - dummy_node).
And you can find out which nodes (from the group P2) it actually went to via _path_ key in the response.
An important point is to give numpath=size(P2) so that it continues to find all the paths, and you can retrieve all the nodes visited from the response.
If your P2 nodes are inter-connected then you can give an even higher number to numpaths to account for paths like src - A - B - dummy node. This is a hacky way to make it visit all nodes even if it has found the shortest path already. Your performance would suffer but you would be making only one query.
Interesting hack! It would be interesting to compare the performance of a solution like this to a hacky solution in a traditional relational database.
The number would only grow with time and the numpath could quickly reach into the 100k. Do you have any idea how performance heavy that query is? I haven’t used DGraph before so I have a hard time estimating it. I could probably improve the performance a bit by filtering out some nodes (eg on date)
I also assume there is no pagination possible here? If there is, one solution would be to make use of that to save some performance
I’ve explored this further by setting Dgraph up locally and filling it up with some data, so I got some more information and knowledge now.
I’ve encountered a problem. Below you’ll see an image of an example query, let me explain the issue.
To get this Graph I’ve queried for the shortest path between the nodes inside of the Green Circles.
The issue is that in the real world I’d want to query for the paths between the nodes inside of the Red Circles. But when I do that (even with numpath: 2) all the nodes striked with a purple line is removed from the query, since it’s a circular path. But without those nodes I’m unable to determine if the removed node is in the path or not.
For reference here is the query that I’ve been using:
{
path as shortest(from: <SOURCE_UID>, to: <TARGET_UID>, numpaths: 2) {
~origins
~assembledFrom
~processedFrom
company
}
path(func: uid(path)) {
name
number
assembledFrom {
name: number
}
processedFrom {
name: number
}
origins {
name: number
}
company {
name
}
}
}
One solution would be to fetch the entire tree for a specific node as a base. From what I’ve gathered this should be possible with the recurse query. But I’m not getting it to work. I’m either just getting a single node as a response (QUERY #1) or an error recurse queries require that all predicates are specified in one level (QUERY #2). The documentation isn’t of much help here unfortunately. It looks like you’ve added support for it from this thread, maybe I’m not using it correctly?
QUERY #1:
{
tree(func: eq(number, "3429d692")) @recurse(depth: 12992203, loop: true) {
name: number
company
processedFrom
assembledFrom
origins
}
}
QUERY 2:
{
tree(func: eq(number, "3429d692")) @recurse(depth: 12992203, loop: true) {
name: number
company {
name
}
processedFrom {
number
}
assembledFrom {
number
}
origins {
number
}
}
}
In the picture above I’ve placed the nodes and edges according to the previous picture I posted. This query seems to satisfy what I’m looking for. Although it would return a lot of extra data since it’s returning the entire tree, when I’m actually just interested in the entire tree leading to a specific end node. I’m still very happy to get some help in this issue.
I have a problem with the query still, it seems like 4 nodes are missing for 2 different nodes. They are marked as red circles in the image. These seem to be missing because of the way I’m storing the directions of the edges and querying for it. I’m going to look over my dataset and make sure it’s all correct before going further.
EDIT: Got the Recursive query working. It would still be great if it’s possible to get DGraph filter out all paths that’s not of interest, worst case I’ll have to fetch the entire subtree for a node and filter out that myself
Hi @johan-l,
Thanks for your questions. I will try to answer them below.
The query which I suggested will be computation intensive. Internally it will try to either find the number off paths you asked for or continue until it visits all the nodes in the process. See runKshortestPaths.
There are two shortest path functions implemented internally: shortestPath and runKshortestPaths. The former returns one path and the latter returns (or tries to return) total numPaths in ascending order of the distance . I do know understand how would pagination work here.
It is not clear to me from the diagram and the explanation as to what is failing. Is it possible for you to give a sample schema/data for me to work through it?
Looking at the diagram if you do query between two red circled nodes with numPaths=2 it should return two paths of length 7 and 9. Circular walks are removed because they are not paths in strictest terms of the word. (See definitions on Wikipedia)
Happy to work with you on this if you could provide more information on schema and data.
Also what would you have liked to see more of in the documentation? Tagging our technical writers @damian and @acarey who can help us get our documentation upgraded.
There are two shortest path functions implemented internally: shortestPath and runKshortestPaths. The former returns one path and the latter returns (or tries to return) total numPaths in ascending order of the distance .
Thanks, thats excellent!
Happy to work with you on this if you could provide more information on schema and data.
Sorry my posts became quite messy since I couldn’t edit the original one (time ran out). I sorted this issue out myself, it was an error in my data as mentioned in the post above.
Also what would you have liked to see more of in the documentation? Tagging our technical writers @damian and @acarey who can help us get our documentation upgraded.
I’m pretty sure the issue was that I were querying the data in the wrong order, and wherefore the returned results were empty. I originally assumed that is must be somethings wrong with the Query and since the documentation only had one example and not too much explanation on my issue I felt stuck.
I ended up re-seeding the database and make sure all edged were in the right direction, which got it working.