Subscribe only to updates

Ist it possible to subscribe only to new data?

E.g. when subscribing via:

subscription {
   queryFoos {
      id
   }
}

I get always the full list of all existing Foos. As this list can grow very large and I’m only interested in new Foos I also only want to get them and not all.

Is something like this already planned?

Something lilke this would probably do:

subscription {
   queryFoos @subscribeChangesOnly {
      id
   }
}

Is there a current workaround?

Hi @maaft, you can do couple of things.
1.query using first
A negative number say N in first will give the last N data values in result. So for subscription, you will get only latest N values.

subscription {
   queryFoos (first:-N) {
      id
   }
}

2.Using Auth
You can apply auth in shema, such that changes made by only one user are received.
For example while querying Todo using below schema ,you will only receive changes made by the current owner that you pass in JWT.

type Todo @withSubscription @auth(
    	query: { rule: """
    		query ($USER: String!) {
    			queryTodo(filter: { owner: { eq: $USER } } ) {
    				__typename
    			}
   			}"""
     	}
   ){
        id: ID!
    	text: String! @search(by: [term])
     	owner: String! @search(by: [hash])
   }
 #Dgraph.Authorization {"VerificationKey":"secret","Header":"Authorization","Namespace":"https://dgraph.io","Algo":"HS256"}

Is the data always sorted wrt time? Is dgraph internally using createdAt and updatedAt timestamps for this? What about deleted entries? Otherwise this would not work.

I already have all my queries guarded with @auth . The huge number of Foos already occur on a per-user basis.

Maybe I can include an optional parameter udpatedAfter and filter my own updatedAt timestamps.

Data is sorted with respect to Uids.So the objects which inserted first will have lower uid than those which are inserted later. And first just return objects with Uid order.

So, if you update an object then Uid is not changed and you won’t be getting that in recent results.
We don’t use any createdAt and updtedAt timestamps. Yeah, the alternative is to have your own updatedAfter parameter and updateAt timestamps which you have to update whenever you update the object.

Once you have it you can order your query using updatedAt field and use first.
https://dgraph.io/docs/master/graphql/queries/order-page/

@JatinDevDG Thank you. With custom updatedAt timestamps it’s working. Unfortunately, this is very error-prone as every client always need to make sure to update that timestamp. If I forget to update the field in some written client-code, I’ll not receive the data I need.

Do you happen to know when server-side managed createdAt and updatedAt timestamps will be available or what the progress is on this? It would really help in a lot of cases and feels like a very basic feature to have. See also: Query sever timestamps in GraphQL?

Edit: I removed the “solution” tag as this workaround is not really what I’m thinking of.

When implementing this using updatedAt timestamps (set from client or server-side - doesn’t matter) you have to restart the subscription every-time you get data and change the startTime variable accordingly (see query below). Otherwise you’ll run into the same issue - over-fetching large amount of data that you don’t need.

subscription FooUpdates($startTime: DateTime!) {
   queryFoo(filter: {updatedAt: {ge: $startTime} }) {
    ...
   }
}

I suspect that restarting the websocket-connection (potentially multiple times per second) is not very nice to the server.

So instead I propose following directive:

type Foo {
  value: Int!
  bars: [Bar!]!
}

type Bar {
   name: String!
}

subscription FooUpdates {
   queryFoo @subscribeChangesOnly {
     value
     bars {
        name
     }
   }
}

When using this directive, only changes (starting from subscription start) are delivered to the client. Also, changes in sub-graphs will only return the changed part of the graph.

Examples

  • 100 Foos in the database, one of them will get its value changed: → Client will receive only that item.
  • 100 Foos in the database, one Bar get its name changed: → Client will receive only that foo which corresponds to the changed Bar object. Also, only the changed bar object will be returned and not all bar-childs of foo.

@JatinDevDG Please reconsider adding this feature!

Hi @maaft, Thanks for this detailed feature request.
I have a doubt here.

For the below subscription with @subscribeChangesOnly directory, we will get the changes that are done after the subscription start. Then why we are passing ($startTime: DateTime!) here.
In any case we still need to restart the subscription to change that start time . right ?

subscription FooUpdates($startTime: DateTime!) {
   queryFoo @subscribeChangesOnly {
     value
     bars {
        name
     }
   }
}

Anyway,we will discuss this and server-side timestamp internally and let you know the update.

Hi @JatinDevDG!

This is a copy-paste error from me. startTime shouldn’t be required in that case. I’ll update my post.

And yes, we would need to restart the subscription to change the starting time. But that is not important here because we will only get updated data anyway, regardless of its internal timestamps.

Thank you very much for discussing this with the team!

So, I just want to confirm there is no difference between

subscription FooUpdates {
  queryFoo @subscribeChangesOnly {
     value
     bars {
        name
     }
   }
}

and

subscription FooUpdates($startTime: DateTime!) {
   queryFoo(filter: {updatedAt: {ge: $startTime} }) {
    ...
   }
}

This means internally we may be having a timestamp when the subscription started and use that as startTime by default. And we don’t need to give that explicitly.
Is there anything else that I may be missing here?

Yes, I think you are missing something. I’ll try to explain.

subscription FooUpdates($startTime: DateTime!) {
   queryFoo(filter: {updatedAt: {ge: $startTime} }) {
    value
     bars {
       name
     }
   }
}

This subscription would always return all child-data - regardless what has changed and what has not changed. When Foo has 100000 childs and one of them changes, I’ll get all childs. Of course only provided that the updatedAt timestamp of foo is updated when bar is changed. Otherwise I wouldn’t get any result at all, because foos timestamp didn’t change.

subscription FooUpdates {
  queryFoo @subscribeChangesOnly {
     value
     bars {
        name
     }
   }
}

This here would return only changed subgraphs. I.e. when Foo has 100000 childs and one of them changes, I’ll get only the child that changed (+ all requested fields on that child).

How can this be achieved by using timestamps?

Let’s assume for one moment that server-side updatedAt timestamps are managed. For implementing the proposed @subscribeChangesOnly directive we would need another managed timestamp on every object: childUpdatedAt. This timestamp would always be set when a child was updated. For efficiency this could only be allowed on childs that can reach their parent via @hasInverse tags. updatedAt would be set whenever a primitive field was changed on the parent object.

Then we could translate the @subscribeChangesOnly subscription to:

subscription FooUpdates {
  queryFoo(filter: {updatedAt: {ge: $startTime} or: {childUpdatedAt {ge: $startTime} } }){
     value
     bars (filter: {updatedAt: {ge: $startTime} or: {childUpdatedAt {ge: $startTime} } }) {
        name
     }
   }
}

$startTime is the internal recorded timestamp for subscription initialization.

Hey @maaft

I see that you are trying to get a diff of the graph JSON response. It’s an interesting use-case that we could look into though maintaining a childUpdatedAt edge seems like something that the client should do. We are also looking into supporting subscriptions on add/update/delete mutations soon. So whenever a node of a type is added or updated, the user would get info about the node. Is that something that would help you here?

Really? What about child-child-child-child-relationships? I think it would not work to let the client handle this. It would lead to very error-prone code and unnecessary traffic. You easily forget to update all needed timestamps and are going to break your app almost always at some point in time.

Timestamps should not be handled by the client. SQL clients also doesn’t need to take care of this. The database does it. Or you can tell the database to do it by using database-triggers. I guess what comes closest to these triggers are @lambda directives. Only problem here is that the dgraph lambda server is implemented in javascript and good luck trying to run that cross-plattform. (see Why Javascript for dgraph-lambda?)

2 Likes

We are investigating a way by which default values like for createdAt and updatedAt can be taken care of by the GraphQL server. Though when it comes to things like updating a grandparent when a grandchild node is updated, that’s the kind of business logic for which the user can better take care of.

Are your concerns around @lambda addressed now and would you like to give that a try?

When want to use @lambda, I’ll have to write may own golang implementation as I cannot ship javascript for windows in an easy way (docker is not an option unfortunately). Some other user had posted his golang prototype so maybe this will help to get me started.

This reply might come a bit late, but I think that this would definitively be step in the right direction.
What would the response to e.g. an update-subscription look like?
Would it contain only the updated nodes?

@pawan the more I think about it, the less I can wait for this feature to come.
So, is this in development yet? Sounds really, really useful!

Hi @maaft, we haven’t started working on this feature as of now. I will discuss it with the team and let you know. Thanks.

1 Like