Performant node versioning

vinaypillai · December 21, 2020, 11:58pm

Hi! I was interested in implementing some form of versioning for node details, and was wondering if it would be more performant to store deltas or snapshots of each node upon a change. In the context of user profiles, for instance, let’s say we have an initial profile:

{
    "profileID":"1000",
    "details":[
        {
            "name":"Jake",
            "favorite-color":"blue",
            "DOB":"11/1/2000",
            "timestamp":"100000"
        }
    ]
}

If the user changes their favorite color from blue to red, we could store the changes with a new snapshot of the profile:

{
    "profileID":"1000",
    "details":[
        {
            "name":"Jake",
            "favorite-color":"blue",
            "DOB":"11/1/2000",
            "timestamp":"100000"
        },
        {
            "name":"Jake",
            "favorite-color":"red",
            "DOB":"11/1/2000",
            "timestamp":"100023"
        }
    ]
}

However, that results in a high storage cost as the database expands. It would be preferable to just store the changes that were made:

{
    "profileID":"1000",
    "details":[
        {
            "name":"Jake",
            "favorite-color":"blue",
            "DOB":"11/1/2000",
            "timestamp":"100000"
        },
        {
            "favorite-color":"red",
            "timestamp":"100023"
        }
    ]
}

Is there a performant way to query the state of a user’s profile at a given point of time that would coalesce these deltas into a single node? Alternatively, is there a preferred way of tracking a node’s historical data?

anand · December 22, 2020, 9:36am

Hi @vinaypillai,
IMO the problem of managing changes to master data like profiles has a strong process orientation. In this demo, I provide a GraphQL based approach to tracking lineage of master data. By handling both process and data aspects in one solution, Dgraph can certainly give you an economy of scale.

Please review.

vinaypillai · December 22, 2020, 4:11pm

Hi @anand,
I took a look at the demo you linked and it seems like you have a good method for storing the deltas to customer states. For my particular use case, it is necessary to be able to query the state of a large number of profiles at a given point in time. As a result, I was wondering if you would happen to have a good way to query for prior states without having to manually revert deltas at the application level.

anand · December 22, 2020, 4:16pm

Hi @vinaypillai,

I did not get this. What does “manually revert to deltas” mean?

I suggest you express all the data in some kind of graph structure and let DQL traverse through it. DQL is quite versatile.

vinaypillai · December 22, 2020, 4:27pm

Hi @anand,
I guess I’m a bit uncertain as to what the DQL query for a customer’s details will return. I was assuming that the query would return the current customer state, along with the history list of things that were changed, so if one were to try to reach a prior state, they would have to traverse the history list to the target state and undo all those changes. Is there a more performant way to do this with DQL?

Thanks!

anand · December 22, 2020, 4:41pm

It sounds like a log replay mechanism à la Kafka replay. IMO, capturing the whole state might be expensive storage wise, but computationally cheaper and maintainable solution in the long run. Not sure DQL alone can be of much help for this scenario.

vinaypillai · December 22, 2020, 4:51pm

Thanks! Just wanted to make sure there wasn’t a better option before I started heading down that route.

amaster507 · December 23, 2020, 3:50am

@anand when we were building our own graphql api pre-Dgraph. We were working on implementing versions with arrays and the last in the array was the most recent. Unfortunately I can’t do this with Dgraph because arrays are sets instead of arrays. Having orderable, duplicate entries in arrays would help provide a better flow for handling versioning use cases.

Topic		Replies	Views
Storing and querying historical data Users	11	2519	September 14, 2018
What can I store? Users	7	782	June 11, 2018
How to version edges between nodes Dgraph kind:question	2	418	December 21, 2020
Version controlling a predicate Users	2	1298	December 2, 2017
DGraph v0.2 Release Announce	1	1374	April 4, 2016

Performant node versioning

Related topics