Recursive Queries

I have several million pairs of part numbers. In each pair, the first part number is the “parent” part and the second is the “child” and is, of course, one to many. By loading these into a graph, I can take a top level parent and recursively process to produce a tree, representing the assembly structure of the product.

So in S-P-O, it would be something like “Part Has Part”. I haven’t seen any recursion query capability in the docs. In Gremlin I’d use “repeat()”. In an SQL database: a WITH RECURSIVE CTE.

Do DGraph support such queries?

Hey @mandolyte

We don’t support it yet. So, Just to understand the use case, You’d like to expand a predicate N times given the starting node right?

Something like

{
 me(id: <start>) {
  has @repeat(10) # Traverse "has" 10 levels deep.
 }
}
2 Likes

While repeat() can take a max depth, it should be able to traverse until all leaf nodes are found.

In Gremlin, here is one I had used:
g.V().has('name','Q1').repeat(out().simplePath()).emit(__.not(out().simplePath())).path().by('name') ==>[Q1, Q2, Q3, Q4]

In this case it produces the “path” (starting node to leaf node).

Also needed are cycle detection, in order to either avoid or to detect endless loops so they can be fixed.

While Oracle’s implementation is non-standard SQL, it is quite complete and easy to understand. For example, this page shows what can be done [here]

Thanks for asking for requirements.

2 Likes

@ashwin95r let’s add support for this.

Thanks for the suggestions @mandolyte.

How does this format look? @mrjn

# To traverse the friends and relatives edges 
# completely and get all their names.
{
  recurse(id: <start>) {
    friend
    relative
    name # This is a leaf node.
  }
} 

# To traverse the predicates 10 times 
# and get their names.
{
  recurse(id: <start>, depth: 10) {
    has
    name 
  }
}

We will avoid visiting the same node twice. 
And will return the entire subtree as the result.

An implementation: https://github.com/dgraph-io/dgraph/pull/647

Will this syntax be better ?

{  recurse(id: <start>) {   
       friend { name }    
       relative { name, age }  
   } 
} 

Only one more level is allowed in query that will help in knowing which attributes to fetch.
This will be easier to implement, as we know which are leaf attributes by looking at query,
rather then in previous query case.

As recursion traversal proceeds, wouldn’t be simpler to permit only a single edge type (predicate)? If multiples were needed I think a union of result sets would suffice, which I think you already have.

Since an edge type could easily have multiple node types on the to-side (object-side?), then you’d want a filter to restrict the traversal to just the node types desired.

I liked the depth constraint.

While you’d want to avoid visiting the same node twice to avoid infinite loops, you also need a way to detect such loops. So perhaps an option to flag such a node. Sometimes this is a data quality problem. But other times (like a airplane flight round-trip), it is “normal”. In the latter case, it is the desired result and the terminating condition for the traversal.

Once the traversal is done, the results need a “projection” so that the results have the attributes from nodes/edges in the returned result. For example, on the nodes, you might need the flight number or part number. On the edges, you’d want predicted flight time or quantity respectively.

Thanks, looking nice! I have a nice set of data I’d like try with this when its ready!

This can be returned as a list separately. Would that work? We can have a option saying, flagdup which can return these nodes if requested. Let me know what you think.[quote=“mandolyte, post:7, topic:1306”]
Since an edge type could easily have multiple node types on the to-side (object-side?), then you’d want a filter to restrict the traversal to just the node types desired.
[/quote]

You can use different types of filters to achieve this effect I think. Please have a look at: https://wiki.dgraph.io/Query_Language#Functions
https://wiki.dgraph.io/Query_Language#Facets_:_Edge_attributes

This projection is part of the recursion. In dgraph the values are also stored as nodes. So, for example you want the flight number, you’d specify the corresponding predicate within the recurse query.

{
 recurse(id: <flight-id>) {
  destination
  connecting-flight
  name  # value
  flight-number  # value
 }
}
1 Like

@mandolyte Support has been added in master. Any feedback on it with your data would be great! we’d love to help you load the dataset and run some queries on it if its alright with you!

The flagdup idea sounds good. Thanks.

I have a small dataset that I used with Gremlin/Titan. I’ll find it next week. Sigh… wish my day job gave me more time to explore this… I’m excited to see a graph database implemented in Go. I’ve always thought such an implementation could evolve faster than JVM based ones… So far, I think I’m right!:relaxed:

2 Likes

I found a tiny toy self-contained SQL that can be used with SQLite3. It illustrates the primary requirements for recursion: level (or depth) number, path generation, and cycle detection. Here is the output:

level|parent|node|path      |cyclic_flag
0    |     A|   B|/A/B      |0
1    |     B|   C|/A/B/C    |0
2    |     C|   D|/A/B/C/D  |0
3    |     D|   A|/A/B/C/D/A|1

You can just paste this SQL into SQLite3 to get above output:

with dataset as
(
select 'A' as parent, 'B' as node union all
select 'B' as parent, 'C' as node union all
select 'C' as parent, 'D' as node union all 
select 'D' as parent, 'A' as node
),
hierarchy( level, parent, node, path, cyclic_flag )
as
(
select 0 as level,
dataset.parent,
dataset.node,
'/' || dataset.parent || '/' || dataset.node as path,
0 as cyclic_flag
from dataset
where dataset.parent = 'A'

union all
select
hierarchy.level + 1 as level,
dataset.parent,
dataset.node,
hierarchy.path || '/' || dataset.node as path,
case
when
(length(path||dataset.node) - length(replace(path|| dataset.node,dataset.node,'')))/length(dataset.node) = 1
then 0
else 1
end as cyclic_flag
from hierarchy
inner join dataset
on dataset.parent = hierarchy.node
where (length(path) - length(replace(path,hierarchy.node,'')))/length(hierarchy.node) < 2
)
select *
from hierarchy

order by path
;

Hope to spend some time next on testing this with your tip changes. This toy example can be a functionality test. I’ll work up a larger dataset for a performance test.

Since I don’t know much about DGraph yet, I need to spend some serious time with your tutorials and documentation before I can get far on my own.

Thanks for your responsiveness!
Cecil

So from my rough understanding (of the complex query :stuck_out_tongue: ). The similar query on dgraph would be:

#  Load the data
mutation{
 set{
  <A> <child> <B> .
  <B> <child> <C> .
  <C> <child> <D> .
  <D> <child> <A> .
  <A> <name> "A" .
  <B> <name> "B" .
  <C> <name> "C" .
  <D> <name> "D" .
 }
}

# Now a query
{
 recurse(id:A) {
  child
  name
 }
}

This should return

{
  "recurse": [
    {
      "child": [
        {
          "child": [
            {
              "child": [
                {
                  "child": [
                    {
                      "name": "A"
                    }
                  ],
                  "name": "D"
                }
              ],
              "name": "C"
            }
          ],
          "name": "B"
        }
      ],
      "name": "A"
    }
  ]
}

This is the current output. We’re working on tagging the node which created the cycle with a flag (So node A would be marked). I hope this is close to the example you have specified with SQL.

That’s getting there. Thanks.

The following omissions at present all would require recursive parsing of the JSON returned value - which sort of defeats the purpose of having the graph engine add its value and smarts. Thoughts on these?

  • depth
  • cycle detection
  • paths: paths are needed ANSI SQL in order to keep the parent/child rows together; Oracle’s CONNECT BY method doesn’t need this. As a graph engine, DGraph may also work more like Oracle’s implementation, keeping the parent and children together. But even so, generation of the path is very useful in many reporting outputs. In Gremlin this is done by the “simplePath()” step.

Over the weekend, I did some reading and have begun to see that the differences between a normal directed property graph and an RDF-kind of graph are substantial. I’ll ask on another thread about this so that discussion is buried here.

Thanks for the sample!

Without the recursive reply, you’ll lose the parent-child connection. There is a @normalize directive though, which you can typically use to flatten out the graph response that Dgraph returns.

Finally have some datasets I am ready to use. I’m tempted to wait until DGraph is “go gettable” because our firewalls prevent me from doing a normal install at work.

I have a smaller dataset I can use at home - small enough to mail.

There is a bulk loader right? Can you point me to it? I plan to work thru the process at home; then take on the larger dataset at work when I solve the install issues. Hopefully, sometime next week will have some feedback for you.

Thanks,
Cecil

That’s great! Here is an example of how to use the bulk loader: https://wiki.dgraph.io/Get_Started#Load_dataset

It’s in cmd/dgraphloader if you’re building from source.

And feel free to mail the dataset to ashwin at dgraph.io or you can post it here if its alright to be public.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.