Performant node versioning

Hi! I was interested in implementing some form of versioning for node details, and was wondering if it would be more performant to store deltas or snapshots of each node upon a change. In the context of user profiles, for instance, let’s say we have an initial profile:

{
    "profileID":"1000",
    "details":[
        {
            "name":"Jake",
            "favorite-color":"blue",
            "DOB":"11/1/2000",
            "timestamp":"100000"
        }
    ]
}

If the user changes their favorite color from blue to red, we could store the changes with a new snapshot of the profile:

{
    "profileID":"1000",
    "details":[
        {
            "name":"Jake",
            "favorite-color":"blue",
            "DOB":"11/1/2000",
            "timestamp":"100000"
        },
        {
            "name":"Jake",
            "favorite-color":"red",
            "DOB":"11/1/2000",
            "timestamp":"100023"
        }
    ]
}

However, that results in a high storage cost as the database expands. It would be preferable to just store the changes that were made:

{
    "profileID":"1000",
    "details":[
        {
            "name":"Jake",
            "favorite-color":"blue",
            "DOB":"11/1/2000",
            "timestamp":"100000"
        },
        {
            "favorite-color":"red",
            "timestamp":"100023"
        }
    ]
}

Is there a performant way to query the state of a user’s profile at a given point of time that would coalesce these deltas into a single node? Alternatively, is there a preferred way of tracking a node’s historical data?

Hi @vinaypillai,
IMO the problem of managing changes to master data like profiles has a strong process orientation. In this demo, I provide a GraphQL based approach to tracking lineage of master data. By handling both process and data aspects in one solution, Dgraph can certainly give you an economy of scale.

Please review.

Hi @anand,
I took a look at the demo you linked and it seems like you have a good method for storing the deltas to customer states. For my particular use case, it is necessary to be able to query the state of a large number of profiles at a given point in time. As a result, I was wondering if you would happen to have a good way to query for prior states without having to manually revert deltas at the application level.

Hi @vinaypillai,

I did not get this. What does “manually revert to deltas” mean?

I suggest you express all the data in some kind of graph structure and let DQL traverse through it. DQL is quite versatile.

Hi @anand,
I guess I’m a bit uncertain as to what the DQL query for a customer’s details will return. I was assuming that the query would return the current customer state, along with the history list of things that were changed, so if one were to try to reach a prior state, they would have to traverse the history list to the target state and undo all those changes. Is there a more performant way to do this with DQL?

Thanks!

It sounds like a log replay mechanism à la Kafka replay. IMO, capturing the whole state might be expensive storage wise, but computationally cheaper and maintainable solution in the long run. Not sure DQL alone can be of much help for this scenario.

1 Like

Thanks! Just wanted to make sure there wasn’t a better option before I started heading down that route.

@anand when we were building our own graphql api pre-Dgraph. We were working on implementing versions with arrays and the last in the array was the most recent. Unfortunately I can’t do this with Dgraph because arrays are sets instead of arrays. Having orderable, duplicate entries in arrays would help provide a better flow for handling versioning use cases.