Possible bug in GraphQL layer in Dgraph when altering schema via /alter

Came across the below example via a bug report filed by @akashjain971.

Essentially when you alter GraphQL schema via /alter endpoint, the corresponding types and fields are not kept in-sync.

Example:
Schema:

type Test{
    id: ID!
    name: String!
    marks: Int!
}
type Person{
    name: String!
    age: Int!
    class: String!
}

query – schema { __typename }
Result:

"data": {
        "schema": [
            {
                "predicate": "Person.age"
            },
            {
                "predicate": "Person.class"
            },
            {
                "predicate": "Person.name"
            },
            {
                "predicate": "Test.marks"
            },
            {
                "predicate": "Test.name"
            },
        ],
        "types": [
            {
                "fields": [
                    {
                        "name": "Person.name"
                    },
                    {
                        "name": "Person.age"
                    },
                    {
                        "name": "Person.class"
                    }
                ],
                "name": "Person"
            },
            {
                "fields": [
                    {
                        "name": "Test.name"
                    },
                    {
                        "name": "Test.marks"
                    }
                ],
                "name": "Test"
            },
        ]
    },

After dropping via {"drop_attr": "Test.marks"} on /alter, schema becomes:

"data": {
        "schema": [
            {
                "predicate": "Person.age"
            },
            {
                "predicate": "Person.class"
            },
            {
                "predicate": "Person.name"
            },
            {
                "predicate": "Test.name"
            },
        ],
        "types": [
            {
                "fields": [
                    {
                        "name": "Person.name"
                    },
                    {
                        "name": "Person.age"
                    },
                    {
                        "name": "Person.class"
                    }
                ],
                "name": "Person"
            },
            {
                "fields": [
                    {
                        "name": "Test.name"
                    },
                    {
                        "name": "Test.marks"
                    }
                ],
                "name": "Test"
            },
        ]
    },

After dropping a type in the original schema via {"drop_op": "TYPE", "drop_value": "Person"} schema becomes:

"data": {
        "schema": [
            {
                "predicate": "Person.age"
            },
            {
                "predicate": "Person.class"
            },
            {
                "predicate": "Person.name"
            },
            {
                "predicate": "Test.marks"
            },
            {
                "predicate": "Test.name"
            },
        ],
        "types": [
            {
                "fields": [
                    {
                        "name": "Test.name"
                    },
                    {
                        "name": "Test.marks"
                    }
                ],
                "name": "Test"
            },
        ]
    },

In the first deletion: Test.marks is deleted from the predicates but not from the Type

In the second deletion: Person is deleted from the types but its predicates remain in the predicate list.

This is a consequence of how drop works in edgraph/server.go. I would like your thoughts whether this should be separately handled?

CC: @martinmr @pawan @abhimanyusinghgaur

Not a bug since types and schema were not designed to be in sync. Not sure if we want to add this as a feature. Dropping fields when they are shared across types might have unintended consequences.

I agree that deleting fields when they are shared across types will have unintended consequences. But in GraphQL world all the predicates are appended with their types (eg. Test.marks, Test.name); deleting Test should delete Test.marks and Test.name from the list of predicates.

Similarly deleting Test.marks should update type Test.

Thoughts? @pawan @abhimanyusinghgaur @martinmr

Also, I think this is not straightforward to do, I am wondering if some actual query/application was failing because of this @akashjain971?

One should not try to modify schema through /alter if they are specifying schema through GraphQL. If one is using only DQL, then only they should use /alter. If they are using GraphQL, then they should use only the /admin/schema endpoint to modify the schema. Both of these endpoints together are not supposed to be used because they serve different purposes.

So, when someone initially provides GraphQL schema through /admin/schema, and then they are using /alter to modify the generated DQL schema, then it is up to them to ensure the correctness of the schema. I would call them an advanced user. GraphQL works on top of DQL, so if underlying DQL schema is changed, then GraphQL may not work.

Now, the behavior we are observing is for DQL schema, and to me, it seems correct for the case when you delete a DQL type, and the predicates inside that type are not deleted. Because, as @martinmr pointed, a predicate can be shared across multiple types from DQL perspective. And, DQL doesn’t even mandate one to have types, one can just have predicates, and use them to store their data, without having any types. So, predicates are independent of the types they are in.

Now, this isn’t fully correct:

Even in GraphQL, we have a @dgraph(pred: "predName") directive, which one can use to specify the name of the underlying DQL predicate for a field in GraphQL schema. So, even in GraphQL, one has the capability to share a single predicate across multiple types.

Now, for the 2nd case, when one deletes a predicate in DQL schema, then it is still remaining in the DQL type that was using the predicate. This seems somewhat unexpected behavior because a type can’t be independent of the predicates it contains. A type is composed of predicates, and this is what should be corrected if needed. But, I am not entirely sure if there is a reason behind this behavior in the first place.

2 Likes

Does GraphQL provide support for deleting the unused predicates and types? Slash has this usecase because they modify the schema via /admin but that doesn’t remove unused types and fields, for which they use /alter.

See this here: dgraph/edgraph/server.go at master · dgraph-io/dgraph · GitHub

Understood that the first case need not be handled and it is as expected. Regarding the second case, wondering what @martinmr thinks? Should we delete predicates from each type if they have been deleted from the schema?

Not sure, it’s something we could do but nobody had asked about it until now. Not sure if this is affecting slash or another customer. Having extra predicates in a type is not harmful as types are used during expand queries so having an extra predicate will just be a no-op. When the schema is updated in GraphQL, we’d overwrite the existing type definitions, right? are there situations when predicates are deleted independently of their type?

Maybe someone in the GraphQL can answer the questions above?

Agree.

@abhimanyusinghgaur/@pawan could you please answer on the above two questions?

Separately @akashjain971 could you please suggest if this is impacting a use-case in Slash? It requires significant code change deep in the code-base to support this. We would rather not work on this right now unless it is super critical.

CC: @LGalatin @vvbalaji

1 Like

Yes

Not in GraphQL. But, I think this is what happens when someone does a {"drop_attr": "pred_name"} on /alter endpoint. I guess, Slash has a use-case for that scenario. @akashjain971 can provide better details.

Slash offers users an option to drop unused types and fields from a GraphQL perspective. So if the user had a type Temp with fields aaa and bbb in the initial schema and then removes it in the later versions, we need to list Temp as an unused type and Temp.aaa and Temp.bbb as unused fields so that the user can get rid unused data.

Dropping type Temp should ideally also remove Temp.aaa and Temp.bbb which is not happening right now. So even if the user deletes type Temp, we continue to show them Temp.aaa and Temp.bbb as unused fields. This is confusing and misleading. We can’t expect Slash users to understand DQL schema. So we can either make the behavior in sync or have GraphQL provide support for dropping unused types/fields. Either would work for us.

cc: @gja

How do you find out these unused fields?

Unfortunately this is not always the case, think about this schema as Abhimanyu pointed out.

type Temp {
  name
}

type Temp2 {
  name @dgraph(pred: "Temp.name")
}

This can’t be the default behaviour inside Dgraph because as pointed out earlier in a DQL schema a predicate can be referenced by multiple types. We can build this as a special case in Dgraph or in Slash that if the predicate within the type is only referenced by that type, then you also issue a command to drop them. Note that dropping a type is cheap but dropping predicates requires marking all data for them for deletion so would take more time.

By taking diff of the query schema { __typename } before and after the schema update. See here.

IMO if anyone wants to update the schema, best way is to DROP_ALL and re-upload. That way we don’t have to track unused types etc.

One solution is that if user drops a type Temp then Slash can send request for dropping the type Temp as well as all the predicates in the type. That way we can leverage the code which we already have in dgraph without breaking anything.

As far as dropping individual predicate is concerned that is already handled. To avoid confusion display only the predicates from the types which are in use.

Danger ahead, Proceed with caution :sweat_smile: : DROP_ALL will remove all data, which is not what someone would expect during schema update. Data should be preserved, only the schema should be updated, is what would be a general expectation.

2 Likes