Querying graph structure at a certain point in time

Hello, I’m trying to use facets to query the structure of the graph at a certain point in time.

I have a schema like so:

type Node {
	name
	parent
}

parent: [uid] @reverse .

The parent of a node can change over time, so I’m inserting a new edge to that array with a facet named date of type dateTime. I’d like to be able to query the structure at a certain point in time by filtering any date facets less than the given date and just taking the latest parent as the desired result.

I see that I can use @facets(lt(date, <some date>)) on the parent edge. So that gets me half way there. But how do I just then take the latest edge in a single query?

I’m somewhat new to GQL and having a bit of a hard time expressing this in a single query. Is it possible to do this in a single query? If so could anyone explain how or provide some documentation I can look through?

I’ve just tried this query:

{
  node(func: uid(0x6)) @recurse {
    uid
    name
    ~parent @facets(orderdesc: date) @facets(max(date) AND lt(date,"2020-09-14T11:15:22.938Z"))
  }
}

But I get this error:

One argument expected in max, but got 0

I’ve just found that I can do @facets(orderdesc: date, first: 1) which I think solves the second half of my problem

1 Like

Now that I’ve tried this on a large dataset it doesn’t seem to work as I expected. The first: 1 on the facet doesn’t return the most recent reverse edge and ignore the others.

What I’m trying to get to is where parents can be changed over time, only return the child where the most recent parent relationship is, while keeping the others so I can go back and forward in time.

I’ve gotten to this point with the query, but can’t seem to get the part about only returning the most recent edge. The first: 1 doesn’t seem to do this. Instead I can still see the previous edges pointing to it.

query Structure($id: string, $date: string)
{
  structure(func: eq(name, $id)) @recurse {
    name
    children: ~parent @facets(orderdesc: date, first: 1) @facets(lt(date,"2020-09-14T23:59:59Z"))
  }
}

So if I have 3 nodes, A, B and C. Where A is parent to both B and C. I want to be able to make B the parent of C, while still keeping the edge of C to A, but with a dateTime facet that I can filter on, so I can only see the most recent parent-child relationship for that date

Hey there @davidsbond

Sorry nobody from the core team has responded yet. We promise to do better.

I’d like to correct one tiny quibble in your top post:

I’m somewhat new to GQL…

The query language that you have put in the example is actually DQL, not GQL (GraphQL). DQL is a far more powerful version of GraphQL.

Now, I’d like to replicate your problem. If you could share a sample dataset so I can work out the concrete example of what you are doing, it’d be great.

Also, why do you have 2 @facets() in your latest query ?

Hi @chewxy, no problem. Thanks for the correction, I’ll call it DQL from now on :slight_smile:

I’ve been renaming in my examples because it’s a work thing I’m trying to solve. Is there a way I could send you sample data without posting it publically?

I get given an error when I try to merge all parts into a single @facets, for example:

@facets(orderdesc: date, first: 1, lt(date,"2020-09-14T23:59:59Z")) 

line 5 column 62: Expected ',' or ')' in facet list: (

Yeah I just discovered this for myself

Sure. PM me. Also I don’t need a big dataset. Just small enough to reproduce the issues.

Guys, the issue with Facets here is that you have to separate the function from the facet param. Perhaps this should be documented explicitly.

e.g

@facets(lt(date,"2020-09-14T23:59:59Z")) @facets(orderdesc: date)

BTW, we don’t support first param in Facets as far as I know. Pagination only exists for first-class citizens.

Aggregation must be done in a second block in this case.

{
  var(func: uid(0x6)) @recurse {
    uid
    name #Ordering facets doesn't make sense now
    ~parent @facets(lt(date,"2020-09-14T11:15:22.938Z")) @facets(DT as date)
  }

  me() { #you can use var to hide this
      TheGuy as max(val(DT))
    }

  node(func: uid(TheGuy)) {
    uid
    name
  }
}

Not sure if this query would work. As I’m trying to imagine the structure only.

I’ve got a sample dataset from @davidsbond. That doesn’t work. In fact it runs into a weird edge case where uid(TheGuy) = 0x0. Which then leads to errors like:

while reading posting list with key [[0 0 4 110 97 109 101 0 0 0 0 0 0 0 0 0]]: Invalid UID with value 0 for key: [0 0 4 110 97 109 101 0 0 0 0 0 0 0 0 0]

What I tried was to have multiple blocks in which the facets are filtered, but that clashes with the @recurse. The notion of scope is weak in graphql (and by extension, DQL). We should probably formalize that

Here’s the sample data:

image

The blue node is the input query. Notice that there are TWO s edges from the blue node to the green node? Each of these edges have a facet data called date.

What is wanted is to select the node with the edge that has the largest date. And then recurse from there.

In the query I’m wanting to start from the parent and recurse downards through all children, only choosing the edges with the largest date.

So in this sample, I’d like to see blue to green, but if a green to green has an edge with a larger date than the blue to green, it should use that instead. Resulting in a tree structure where each child node only has a single parent.

This way, I can choose a date and should get the structure at a certain point in time, since all parent-child relationships are stored as edges with a date

So in the image, 645979 is the parent of N63659, N61532 and 963231. But at a point in time the relationships were changed so that N63659’s parent was changed to N61532, so there’s two edges there. But I only want to see which one was the latest one at a given date.

I’m also storing these as reverse edges, so the relationship goes child to parent when I insert the edges.

Put this image together to try and explain what I’m after:

image

The query should choose between one of the two red highlighted, and one of the two blue highlighted based on which one is more recent.

Thought I’d add, we can change the schema if we need to if I’ve defined it in a way that makes this hard.

With a query like this (h/t @dmai):

query Structure($id: string)
{
q(func: eq(name, $id)) @recurse {
    uid
    name
    s   @facets(orderdesc:date) (first:1)
  }
}

I got a result like this
image

Hey @chewxy, that works fine for going upwards, but I’d like to be able to query from 371717 downwards. Using a reverse edge and first: 1 in this way stops the recursion.

Where a child has multiple edges pointing to a parent node, I’d like to only use the latest one based on the date facet. And then be able to apply a lt filter to query at a desired point in time, filtering out any edges to a parent node after that date.

query Structure($id: string)
{
q(func: eq(name, $id)) @recurse {
    uid
    name
    ~s   @facets(orderdesc:date) (first:1)
  }
  
}

begets this
image

I think the dataset I provided might not have been sufficient to demonstrate the problem, that’s my bad. I can provide a proper export if that helps

If you create a new child that has an edge to 371717 you’ll only get the latest child.

For example, without (first: 1) in the query I have something that looks like this:

query Structure($id: string)
{
  q(func: eq(name, $id)) @recurse(depth: 5) {
      uid
      name
      s: ~sponsor @facets(orderdesc:date)
  }
}

Screenshot from 2020-09-18 02-09-17

At different points in time both 371717 and N69579 have been parents of N69542. I want to only show whichever of those is the latest one, whilst still maintaining the rest of the structure below 371717.

If I then update the query to include (first: 1):

query Structure($id: string)
{
  q(func: eq(name, $id)) @recurse(depth: 5) {
      uid
      name
      sponsoring: ~sponsor @facets(orderdesc:date) (first: 1)
  }
}

Screenshot from 2020-09-18 02-11-39

I’m now only getting the very latest child.

Thought I’d write an update as I think I’ve come up with a solution for my problem after a bit of research.

Each edge to a parent now has 2 facets date and expired. I’m using UNIX timestamps to represent the date the edge was created and expired for the date the movement occurred. On the newest edge, the expired is set to the maximum in64 value.

When a child moves, the previous edge’s expired facet is set to the same date facet of the new edge. I can then perform a query like this:

query Structure($id: string)
{
  q(func: eq(name, $id)) @recurse {
      uid
      name
      sponsoring: ~sponsor @facets(orderdesc: date, date, expired) @facets(lt(date, 10000000000) AND gt(expired, 10000000000))
  }
}

This appears to give me a top down view of the graph structure at a specific point in time. I actually got the idea from a blog post about doing the same thing in arangodb

This appears to work exactly as I want. Thanks for all the assistance @chewxy and @MichelDiz

1 Like

We have a similar use case, however it differs in that we need to track multiple time-periods in which something is valid. For instance A -> B from t1-t4 as well as t8-10.

Our solution was creating a “temporal” node that holds the timeranges A -> TEMPORAL -> B and we list the time ranges in the TEMPORAL node. If facets were more powerful, we would have preferred them, since this solution necessitates 50% more nodes and 100% more edges in our graph.

However this isn’t sufficient since a list of startTimes and a list of endTimes isn’t enough information to do a INTERSECT query, since you need to know which starts and ends match to form a “range”. We ended up making a customer tokenizer for dgraph that makes timeranges as startTime:endTime strings.

It would be great if dgraph supported a “range” type. I could represent my times as ints and use that.