Storing and querying historical data

bernd · August 9, 2018, 11:37am

Hi,

I am currently evaluating dgraph for a company internal use case we are having.
The data we want to store are graphs. An update to a node/edge/value in the graph should however not alter the node/edge/value itself but create a new version of it. From a logical point of view, this means that every change to an element in the graph leads to a new version of the graph.
When we run queries on the graph, we want to run them on a specific, potentially historical state.

Do you think this is a good use case for dgraph and if so, what would be the best way to encode the versions in the schema and queries?

Thanks in advance,
bernd

MichelDiz · August 9, 2018, 1:57pm

Hi Bernd,

I believe you could use it like this:

That way you can use “Recurse Query”.

Check this discuss Flattening to get unique nodes within n steps

Cheers

bernd · August 9, 2018, 5:36pm

Hi Michel,

thank you for your proposal. But I am not sure I understand.
Just to make sure I understand:
You propose that each node points to the new version, right? (Or respectively to its predecessor).

Let’s have the following small example.
At T0 the model looks as follows

A -someRole->B

At T1 I modify B to B'
If I now follow the link someRole from A I would expect to end up at B', however a was not modified itself.
If I run the same query with the historic version/timestamp/id from T0 I would like to get to the original B again.

Thanks
Bernd

MichelDiz · August 9, 2018, 6:48pm

Exactly. The Edge “UPDATE” points to a new version of that node and keeps rolling.

It’s like a “Ring a Ring o’ Roses”, “that does not close”. That way you would maintain a historical relationship between these nodes. Since the “Root” Node is the original. I would say that a better example for this would be like a “chain” (blockchain).

In this logic you would always look for the “Root Node” and use “Recurse Query” to expand the relationships (in loop) of this chain model. You could relate the “Root Node” direcly to a user for example. And expand from there.

And in the case of writing (mutate), it would be like Bitcoin does, it takes the previous “block” and links it with the block to be generated.

bernd · August 9, 2018, 7:18pm

Thx! Will give it a try!

MichelDiz · August 9, 2018, 7:46pm

As you said you’re new with Dgraph, try https://tour.dgraph.io/ before anything.

Cheers

sriharshaboda · August 13, 2018, 10:26am

@bernd What you could do is have a timestamp based version as a facet on the edge. Every time you update, add a new edge without deleting the older edge, this new edge will now contain the latest timestamp as a facet. So whenever you want to fetch the latest you can order by (descending) this version facet and fetch the latest. Or if you’d like to operate on a particular historical version of the graph, you can filter the facets with this version to get edges belonging only to that version.

MichelDiz · August 13, 2018, 2:34pm

Yes, this can be done. But I think it would be bad through time. Every time he query for $> schema {} he will have thousands of new Edges/predicates. Disrupting the normal use of Dgraph.

I would recommend creating new Nodes (thus maintaining a cohesive search structure). Whether they are connected via Edge as I exemplified or simply lonely Nodes, attached directly to the owner.

Facets are recommended to “tell a story” of an Edge’s relationship. So I would not recommend for this case. But you can do.

lazharichir · August 14, 2018, 12:31pm

If you have your own unique IDs, add this as a predicate along with the update timestamp so each object is inserted as a new node containing its insertion date and its unique id (created by you).

So, when getting data, you can query by ID, limit:1, sort: insertion date DESC. That way, you get the last version.

bernd · August 15, 2018, 6:27am

I have my own ID, however if I use your suggestion, wouldn’t all references still point to the old versions?

lazharichir · August 15, 2018, 6:42am

They would indeed, so you would need to run a transaction in which you update these as well.

No easy solution as such versioned databases are designed as such from the ground up. You’ll have to fiddle one way or another hehe, let us know how you decide to implement it!

system · September 14, 2018, 6:42am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to version edges between nodes Dgraph kind:question	2	418	December 21, 2020
Performant node versioning Dgraph kind:question	7	877	December 23, 2020
Query the state of the database at any time in the past Users	5	466	January 13, 2019
History of Network Topology Dgraph	1	511	November 20, 2019
Version controlling a predicate Users	2	1298	December 2, 2017

Storing and querying historical data

Related topics