Migrating Data with DQL after changing GraphQL Schema

Lets say we made these changes to our schema (v21.12+):

-type UserEmails {
+type UserEmail {
+  id: ID!
   email: String @search
   verified: Boolean @search
+  owner: User @hasInverse(field: "linkedEmails")
 }

 type User {
-  updatedAt: DateTime
+  updatedAt: DateTime @default(update:{value:"$now"}) @search
-  linkedEmails: [UserEmails]
+  linkedEmails: [UserEmail]
   comments: [Comment]
 }

So what changes were made?

  1. Type UserEmails renamed to UserEmail
  2. Added ID scalar mapped to UserEmail.id
  3. added inverse edge at UserEmail.owner inversely mapped to User.linkedEmails
  4. Added @default(...) and @search directives to User.updatedAt field
  5. Replace edge with renamed type on User.linkedEmails

NOTE: When you modify the GraphQL schema, the underlying data does not migrate with your changes. You must manually migrate your data manually if needed.

To make our life easier let’s limit down the task list above to what we need to do. #2, #4, and #5 are mute points. These changes do not affect the underlying data. This leaves just two tasks at hand, with a third that I will explain next.

Retype nodes from UserEmails to UserEmail
Add inverse relationships mapping from User.linkedEmails to UserEmail.owner
Rename predicates of the renamed type UserEmail

The third task above, involves having an understanding of how the GraphQL schema gets mapped to the underlying DQL schema. This gets done automatically, unless you control it with the @dgraph directive. So the current DQL schema would be:

// DQL Schema (truncated)
type UserEmails {
  UserEmails.email
  UserEmails.verified
}
type UserEmail {
  UserEmail.email
  UserEmail.verified
  UserEmail.owner
}
type User {
  User.updatedAt
  User.linkedEmails
}
UserEmails.email: string @index(term) .
UserEmails.verified: bool @index(bool) .
UserEmail.email: string @index(term) .
UserEmail.verified: bool @index(bool) .
UserEmail.owner: [uid] .
User.updatedAt: dateTime @index(year) .
User.linkedEmails: [uid] .

There are a few things you should notice here:

  • The old type still exists in the DQL schema
  • Fields of types are mapped to a <type>.<field> predicate.
  • Edges (eg, User.linkedEmails) link to a uid list (eg, [uid]) so a they can be of any type or non-type with DQL loose-type data modeling.

This helps us see that the data in the database is still in its original place and explains why it would become mostly inaccessible now with the changed GraphQL schema. Dgraph expects data to be at a specific place and that data is no longer there.

There are three ways to perform this data migration. We could 1) remap the GraphQL schema using the @dgraph directive see note below, 2) duplicate the type placing the old GraphQL type and field back in the GraphQL schema and use a script to query the GraphQL types and mutate the data back into the GraphQL endpoint which would be slow and tedius, or 3) use DQL upserts to migrate data. We are going to migrate this data with DQL upserts.

NOTE: You could just remap the type and predicates using the @dgraph directive instead of migrating the data.

This would make the new schema be:

type UserEmail @dgraph(type: "UserEmails") {
  id: ID!
  email: String @search @dgraph(pred:"UserEmails.email")
  verified: Boolean @search @dgraph(pred:"UserEmails.verified")
  owner: User @hasInverse(field:"linkedEmails") @dgraph(pred:"UserEmails.owner")
}
type User {
  updatedAt: DateTime @default(update:{value:"$now"}) @search
  linkedEmails: [UserEmail]
  comments: [Comment]
}

But you would still need to create the missing inverse edges which could be done with a script like: Fixing Inverse Relationships

Define upsert: The upsert block allows performing queries and mutations in a single request. The upsert block contains one query block and one or more than one mutation blocks. Variables defined in the query block can be used in the mutation blocks using the uid and val function.

If you haven’t used the DQL uid() or val() functions before, I encourage you to read the documentation on them before continuing.

To rename nodes from one type to another, we need to query the nodes of that type, create a var to use in the mutation blocks and then use that var to set the new type and delete the old type. That looks like the following in DQL:

upsert {
  query {
    x as var(func: type(UserEmails))
  }
  set {
    uid(x) <dgraph.type> "UserEmail" .
  }
  delete {
    uid(x) <dgraph.type> "UserEmails" .
  }
}

To move predicates from one name to another you need to get the nodes having the predicate being migrated and set that to a variable, and then get the old predicate value and set that to a variable. Then delete the old predicate and set the new predicate with the val() function.

NOTE: The val() function goes above just getting the value of the variable, but specifically it gets the value of the variables mapped to the correlating uid of the predicate. Refer to docs

This upsert for migrating a predicate looks like this in DQL:

upsert {
  query {
    x as var(func: has(UserEmails.email)) {
      y as UserEmails.email
    }
  }
  delete {
    uid(x) <UserEmails.email> * .
  }
  set {
    uid(x) <UserEmail.email> val(y) .
  }
}

In GraphQL inverse edges do not use the DQL @reverse directive with the underlying ~ syntax. Instead the GraphQL API manages the balance of two edges. This balance works well during mutating data with the GraphQL API, but if an inverse edge is added, or data is modified outside of the GraphQL API, then it is possible for these two edges to fall out of “sync”. To “resync” these edges, you will need to do something similar to a script such as: Fixing Inverse Relationships

THIS DOES NOT WORK IN UPSERT ONLY FORM!

# THIS DOES NOT WORK IN UPSERT ONLY FORM!
upsert {
  query {
    x as var(func: has(User.linkedEmails)) {
      y as User.linkedEmails
    }
  }
  set {
    # WARNING: USING THIS WILL PRODUCE UNEXPECTED RESULTS
    uid(y) <UserEmail.owner> uid(x) .
  }
}

I think I have enough warnings there, now why? Because, if you run this upsert, there is no val() function being used to correlate the mapped values, so literally EVERY y would be mapped to every x which is not what you want to do. What you want is to only map the correlating y to the correlating x. I have spent a lot of time trying to work my way around this but without a for loop function in DQL () all that we can do is to handle this migration with an external script. Again linking:


Let’s put part 1 of this migration needed into a single upsert script for our example GraphQL schema update:

upsert {
  qurey {
    m as var(func: type(UserEmails))
    n as var(func: has(UserEmails.email)) {
      o as UserEmails.email
    }
    p as var(func: has(UserEmails.verified)) {
      q as UserEmails.verified
    }
  }
  set {
    uid(m) <dgraph.type> "UserEmail" .
    uid(n) <UserEmail.email> val(o) .
    uid(p) <UserEmail.verified> val(q) .
  }
  delete {
    uid(m) <dgraph.type> "UserEmails" .
    uid(n) <UserEmails.email> * .
    uid(p) <UserEmails.verified> * .
  }
}

This would bring our task list to:

Retype nodes from UserEmails to UserEmail
Add inverse relationships mapping from User.linkedEmails to UserEmail.owner
Rename predicates of the renamed type UserEmail

Part 2 would be using the script linked above to fix the inverse relationships. For sake of time, I will not be writing a Part 2, but will just direct you once again to: Fixing Inverse Relationships

3 Likes

This is great! I was looking for this a while back.

Next step, build a UI that does all this automatically!

J

2 Likes