Hello, I’m trying to use facets to query the structure of the graph at a certain point in time.
I have a schema like so:
type Node {
name
parent
}
parent: [uid] @reverse .
The parent of a node can change over time, so I’m inserting a new edge to that array with a facet named date of type dateTime. I’d like to be able to query the structure at a certain point in time by filtering any date facets less than the given date and just taking the latest parent as the desired result.
I see that I can use @facets(lt(date, <some date>)) on the parent edge. So that gets me half way there. But how do I just then take the latest edge in a single query?
I’m somewhat new to GQL and having a bit of a hard time expressing this in a single query. Is it possible to do this in a single query? If so could anyone explain how or provide some documentation I can look through?
Now that I’ve tried this on a large dataset it doesn’t seem to work as I expected. The first: 1 on the facet doesn’t return the most recent reverse edge and ignore the others.
What I’m trying to get to is where parents can be changed over time, only return the child where the most recent parent relationship is, while keeping the others so I can go back and forward in time.
I’ve gotten to this point with the query, but can’t seem to get the part about only returning the most recent edge. The first: 1 doesn’t seem to do this. Instead I can still see the previous edges pointing to it.
So if I have 3 nodes, A, B and C. Where A is parent to both B and C. I want to be able to make B the parent of C, while still keeping the edge of C to A, but with a dateTime facet that I can filter on, so I can only see the most recent parent-child relationship for that date
Sorry nobody from the core team has responded yet. We promise to do better.
I’d like to correct one tiny quibble in your top post:
I’m somewhat new to GQL…
The query language that you have put in the example is actually DQL, not GQL (GraphQL). DQL is a far more powerful version of GraphQL.
Now, I’d like to replicate your problem. If you could share a sample dataset so I can work out the concrete example of what you are doing, it’d be great.
Also, why do you have 2 @facets() in your latest query ?
Hi @chewxy, no problem. Thanks for the correction, I’ll call it DQL from now on
I’ve been renaming in my examples because it’s a work thing I’m trying to solve. Is there a way I could send you sample data without posting it publically?
I get given an error when I try to merge all parts into a single @facets, for example:
Aggregation must be done in a second block in this case.
{
var(func: uid(0x6)) @recurse {
uid
name #Ordering facets doesn't make sense now
~parent @facets(lt(date,"2020-09-14T11:15:22.938Z")) @facets(DT as date)
}
me() { #you can use var to hide this
TheGuy as max(val(DT))
}
node(func: uid(TheGuy)) {
uid
name
}
}
Not sure if this query would work. As I’m trying to imagine the structure only.
I’ve got a sample dataset from @davidsbond. That doesn’t work. In fact it runs into a weird edge case where uid(TheGuy) = 0x0. Which then leads to errors like:
while reading posting list with key [[0 0 4 110 97 109 101 0 0 0 0 0 0 0 0 0]]: Invalid UID with value 0 for key: [0 0 4 110 97 109 101 0 0 0 0 0 0 0 0 0]
What I tried was to have multiple blocks in which the facets are filtered, but that clashes with the @recurse. The notion of scope is weak in graphql (and by extension, DQL). We should probably formalize that
The blue node is the input query. Notice that there are TWO s edges from the blue node to the green node? Each of these edges have a facet data called date.
What is wanted is to select the node with the edge that has the largest date. And then recurse from there.
In the query I’m wanting to start from the parent and recurse downards through all children, only choosing the edges with the largest date.
So in this sample, I’d like to see blue to green, but if a green to green has an edge with a larger date than the blue to green, it should use that instead. Resulting in a tree structure where each child node only has a single parent.
This way, I can choose a date and should get the structure at a certain point in time, since all parent-child relationships are stored as edges with a date
So in the image, 645979 is the parent of N63659, N61532 and 963231. But at a point in time the relationships were changed so that N63659’s parent was changed to N61532, so there’s two edges there. But I only want to see which one was the latest one at a given date.
I’m also storing these as reverse edges, so the relationship goes child to parent when I insert the edges.
Put this image together to try and explain what I’m after:
The query should choose between one of the two red highlighted, and one of the two blue highlighted based on which one is more recent.
Hey @chewxy, that works fine for going upwards, but I’d like to be able to query from 371717 downwards. Using a reverse edge and first: 1 in this way stops the recursion.
Where a child has multiple edges pointing to a parent node, I’d like to only use the latest one based on the date facet. And then be able to apply a lt filter to query at a desired point in time, filtering out any edges to a parent node after that date.
At different points in time both 371717 and N69579 have been parents of N69542. I want to only show whichever of those is the latest one, whilst still maintaining the rest of the structure below 371717.
Thought I’d write an update as I think I’ve come up with a solution for my problem after a bit of research.
Each edge to a parent now has 2 facets date and expired. I’m using UNIX timestamps to represent the date the edge was created and expired for the date the movement occurred. On the newest edge, the expired is set to the maximum in64 value.
When a child moves, the previous edge’s expired facet is set to the same date facet of the new edge. I can then perform a query like this:
This appears to give me a top down view of the graph structure at a specific point in time. I actually got the idea from a blog post about doing the same thing in arangodb
This appears to work exactly as I want. Thanks for all the assistance @chewxy and @MichelDiz
We have a similar use case, however it differs in that we need to track multiple time-periods in which something is valid. For instance A -> B from t1-t4 as well as t8-10.
Our solution was creating a “temporal” node that holds the timeranges A -> TEMPORAL -> B and we list the time ranges in the TEMPORAL node. If facets were more powerful, we would have preferred them, since this solution necessitates 50% more nodes and 100% more edges in our graph.
However this isn’t sufficient since a list of startTimes and a list of endTimes isn’t enough information to do a INTERSECT query, since you need to know which starts and ends match to form a “range”. We ended up making a customer tokenizer for dgraph that makes timeranges as startTime:endTime strings.
It would be great if dgraph supported a “range” type. I could represent my times as ints and use that.