Storing and querying historical data

Hi,

I am currently evaluating dgraph for a company internal use case we are having.
The data we want to store are graphs. An update to a node/edge/value in the graph should however not alter the node/edge/value itself but create a new version of it. From a logical point of view, this means that every change to an element in the graph leads to a new version of the graph.
When we run queries on the graph, we want to run them on a specific, potentially historical state.

Do you think this is a good use case for dgraph and if so, what would be the best way to encode the versions in the schema and queries?

Thanks in advance,
bernd

Hi Bernd,

I believe you could use it like this:

That way you can use “Recurse Query”.

Check this discuss Flattening to get unique nodes within n steps

Cheers

Hi Michel,

thank you for your proposal. But I am not sure I understand.
Just to make sure I understand:
You propose that each node points to the new version, right? (Or respectively to its predecessor).

Let’s have the following small example.
At T0 the model looks as follows

A -someRole->B

At T1 I modify B to B'
If I now follow the link someRole from A I would expect to end up at B', however a was not modified itself.
If I run the same query with the historic version/timestamp/id from T0 I would like to get to the original B again.

Thanks
Bernd

Exactly. The Edge “UPDATE” points to a new version of that node and keeps rolling.

It’s like a “Ring a Ring o’ Roses”, “that does not close”. That way you would maintain a historical relationship between these nodes. Since the “Root” Node is the original. I would say that a better example for this would be like a “chain” (blockchain).

In this logic you would always look for the “Root Node” and use “Recurse Query” to expand the relationships (in loop) of this chain model. You could relate the “Root Node” direcly to a user for example. And expand from there.

And in the case of writing (mutate), it would be like Bitcoin does, it takes the previous “block” and links it with the block to be generated.

Thx! Will give it a try!

As you said you’re new with Dgraph, try https://tour.dgraph.io/ before anything.

Cheers

1 Like

@bernd What you could do is have a timestamp based version as a facet on the edge. Every time you update, add a new edge without deleting the older edge, this new edge will now contain the latest timestamp as a facet. So whenever you want to fetch the latest you can order by (descending) this version facet and fetch the latest. Or if you’d like to operate on a particular historical version of the graph, you can filter the facets with this version to get edges belonging only to that version.

1 Like

Yes, this can be done. But I think it would be bad through time. Every time he query for $> schema {} he will have thousands of new Edges/predicates. Disrupting the normal use of Dgraph.

I would recommend creating new Nodes (thus maintaining a cohesive search structure). Whether they are connected via Edge as I exemplified or simply lonely Nodes, attached directly to the owner.

Facets are recommended to “tell a story” of an Edge’s relationship. So I would not recommend for this case. But you can do.

If you have your own unique IDs, add this as a predicate along with the update timestamp so each object is inserted as a new node containing its insertion date and its unique id (created by you).

So, when getting data, you can query by ID, limit:1, sort: insertion date DESC. That way, you get the last version.

I have my own ID, however if I use your suggestion, wouldn’t all references still point to the old versions?

They would indeed, so you would need to run a transaction in which you update these as well.

No easy solution as such versioned databases are designed as such from the ground up. You’ll have to fiddle one way or another hehe, let us know how you decide to implement it!

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.