Using subscription for data processing

Hi guys, Thanks for the great work. We’re trying to adopt Dgraph within my organisation. Many of our backend applications rely on (and are driven-by) changes to the underlining data. For example, assuming our database contains Todo types, and a Todo is added, we want to notify a Todo processing service. For this, we thought GraphQL subscriptions is the right approach to listen to mutations.

I now have the basic subscription working using the @withSubscription directive and with example query:

subscription queryTodoSubscription {
  queryTodo {
    id
    title
    completed
  }
}

However, due to the requirements in processing the event, I’ve got the following questions:

  1. For efficient processing, how can I filter only for the node that changed using the subscription query?
  2. How can I get the previous state before the mutation?
  3. Bonus: can I also get information about the mutation? I.e., added, updated, or deleted and perhaps with the relevant fields/predicates.
  4. Bonus: can I get notified when an edge node changes?

Your help is highly appreciated. Thanks.

1 Like

Welcome to the Dgraph community @iyinoluwaayoola

Subscriptions in Dgraph would return all the data for the query that you are subscribed to.

This could be done via a filter operation, say you had a created_at and a updated_at field which is part of your Todo, then you could filter the TODOs created at after a certain timestamp. This would make sure that you only get the nodes created in the last interval. For this to work, you would have to re-subscribe by updating the filter condition and using the latest timestamp.

Not sure I get this, could you clarify how you plan to use this? A subscription query would return the results which are valid at the point a query is done. If you wanted the results before a mutation, then the query has to be done before.

This would be possible if your subscription query filter had a filter on created_at or updated_at and such.

It’s not possible currently to subscribe directly to events of nodes being added/updated/deleted. We’ll look into adding that in the future.

Could you share more details of what you are looking for here with an example? Do you mean for e.g. when the title of any Todo changes?

1 Like

Thanks for your quick reply.

OK, I think I get your drift here but isn’t this approach prone to errors? If the updated_at field is set externally, it can be difficult to keep the subscriptions in sync. I was hoping there could be a way to query using contextual variables. For example, if the subscription operation receives information that includes the touched node IDs, then I can choose to construct a query using those variables. E.g.,

subscription queryTodoSubscription ($ids: [ID] ) {
  queryTodo (filter: {id: $ids}) {
    id
    title
    completed
  }
}

I like to know what changed in order to respond appropriately. A way of doing that is to know the data before and after the mutation. For example, if the processor should send an automated message to a Reporter after a Todo is completed. This shouldn’t happen twice because the title was updated after completion. But there is no way to know if the completed field was updated or not. So my question is more about the possibility of knowing the state before the mutation, perhaps embedded in the result extensions.

Following the illustration of the Todo type, assuming Todo also has an edge Doer pointing to a User node. Can I subscribe to the Doer associated with the original subscription query within the same subscription? To answer my own question here, it’s probably more explicit to simply subscribe to User and link back to the Todo using reverse edge.

1 Like

Hi @pawan, I just came across this thread Support logical replication / change data capture (CDC) with Kafka · Issue #2122 · dgraph-io/dgraph · GitHub, which is directly related to my first question. I need a way to capture changes to my data. Besides using GraphQL subscription, it will be sufficient/ even preferable for Dgraph to pipe my subscribed queries to an intermediary, like Kafka. Is this currently possible?

Unfortunately CDC isn’t currently supported. We will be supporting it at some point but I can’t give an ETA for it.

Yeah, so if your subscription query subscribes to a Todo and the associated Doer then you get back a new update if anything in the response changes.

Since CDC is now available, when can we expect delta updates for GraphQL subscriptions?

This is crucial for a whole class of applications.

1 Like