Slash breaking GraphQL constraints & Cascade delete

Hi all,

Broken GraphQL constraint - Example

Let’s have a schema (unimportant fields omitted for brevity):


type Client {
    id: String! @id
    projects: [Project!]! @hasInverse(field: client)
    time_blocks: [TimeBlock!]! @hasInverse(field: client)
}

type Project {
    id: String! @id
    time_entries: [TimeEntry!]! @hasInverse(field: project)
    client: Client!
}

type TimeEntry {
    id: String! @id
    project: Project!
}

type TimeBlock {
    id: String! @id
    invoice: Invoice @hasInverse(field: time_block)
    client: Client!
}

type Invoice {
  id: String! @id
  time_block: TimeBlock!
}

Imagine we have a Project with some Time Entries.
When you delete the Project by deleteProject mutation and then try to query Time Entries, you’ll get a response like:

{
  "errors": [
    {
      "message": "Non-nullable field 'project' (type Project!) was not present in result from Dgraph.  GraphQL error propagation triggered.",
      "locations": [
        {
          "line": 4,
          "column": 5
        }
      ],
      "path": [
        "queryTimeEntry",
        2,
        "project"
      ]
...

Problem

  • We were able to easily break the constraint project: Project! by a pregenerated Slash’s GraphQL mutation in the example above.

  • As the result, we have “orphans” - Time Entries without an associated Project - that are basically a memory leak and can surprisingly affect some queries.

Expectations

  • I want to use Slash as a GraphQL service. It means that the error 'project' (type Project!) was not present in result from Dgraph. feels like an implementation leak/bug because I don’t care about Dgraph at this level of abstraction and I haven’t run any custom Dgraph commands to break it manually.

  • I think GraphQL constraints should have the highest priority. Then I would expect to either “fail early” by returning an error on the delete request or resolve the problem by cascade delete.

Possible Solution

  • One solution probably could be to introduce the argument input: DeleteEntityInput to deleteEntity mutation with the field EntityFilter and the optional boolean field cascade. It would be a breaking change. (Note: Entity is a placeholder for Client, Project, etc.).

  • Or allow users to set the behavior globally - either show an error (that early fail) or cascade delete.

Workaround

Users have to write multiple delete mutations to cover all “lower levels” - e.g. to delete Client you have to delete explicitly also all related Projects, TimeBlocks, TimeEntries and Invoices.
However this workaround is very error-prone and introduces a lof of boilerplate to the app logic.

I’ve read some similar issues in docs and on the forum but maybe there is a proper solution that I’ve overlooked.

Thank you!
Martin

1 Like

This is right, if you delete a project then the associated TimeEntry's would not point to a project anymore and you would get the above error. Not that GraphQL Spec itself doesn’t say that this is not possible, it just says that when a required field is not present in the result, then return an error.

It is entirely possible to break it without using a custom Dgraph command. One of the ways of doing this is what you mentioned here. I think we should update the error message to just say

"Non-nullable field 'project' (type Project!) was not present in the result.  GraphQL error propagation triggered."

Cascading deletes is something that has been asked for before and we can look into it. One problem is that even if we support it, it can be a very slow operation that might not perform very well as you might end up traversing and deleting nodes in a large part of the Graph.

Let’s continue this discussion here:

@pawan Hi. Just pitching in here. I feel that this is one of the most important features missing in Dgraph GraphQL right now and partial deletions are causing a lot of null references making a lot of GraphQL queries fail cause of few bad records.

I feel that it is better to have it as a slow operation to start with rather than not having it at all. Maybe, it can be optimized for performance at a later point in time since deletion operations are typically not frequent anyways.Just looking for ways to prevent bad records in the DB.

Or is there any other way you would avoid such situations? I use hasInverse to link all the schemas. Not sure how to avoid such issues which happen on situations like partial deletions, bad logic and so on affecting all records. Thanks.

2 Likes

There are likely good reasons for not having this feature, but I don’t think this is one of them. If you need to delete a node and doing so would break the schema, you have two options: (1) have the database handle cascade delete, or (2) have the application figure out all nodes that need to be deleted and delete them all. Not performing the delete isn’t an option and neither is leaving unneeded nodes with broken relationships untouched.

So if the deletes must take place and the operation is slow, I guess the only remaining question is whether the application can perform the operation faster than Dgraph. I don’t think this is the case since the app is likely doing all deletes in the same transaction, not spreading them our or otherwise performing them asynchronously.

In my specific application, I maintain my own version of the schema that I can query to discover relationships and where the deletes should cascade. Then I construct the related queries and run them whenever a node needs to be delete. It would be nice to not have to maintain all of this extra code just to keep my graph consistent.

1 Like