Given a schema like this:
type Post {
id: ID!
raw: String!
}
Here’s a way to break a mutation:
mutation MyMutation {
addPost(input: {raw: "\xaa Raw content"}) {
numUids
}
}
giving rise to the following error:
{
"errors": [
{
"message": "Unexpected <Invalid>",
"locations": [
{
"line": 2,
"column": 26
}
]
}
]
}
This works:
mutation MyMutation {
addPost(input: {raw: "\\xaa Raw content"}) {
numUids
}
}
Inconsistent Semantics
The Graphql spec defines a String
as such: “String
: A UTF‐8 character sequence.”. So far so good. Both sequences "\xaa"
and "\\xaa"
are valid UTF-8 character sequences. They have different meanings, however. The first one is basically another way of writting "ª"
, while the second would be represented as "\xaa"
.
Perhaps then there was some unstated requirement that escape sequences (\n
, \t
, \a
, \xhh
etc) should themselves be escaped. But that’s untrue. Our implementation is inconsistent.
To wit, this works too - the sequence is correctly stored.
mutation MyMutation {
addPost(input: {raw: "\n Raw content"}) {
numUids
}
}
It would seem that some escape sequences are more privileged than others. This should not be the case. So what could they be? Perhaps control characters do not require escapes, while octal/hex literals require escapes. The following table enumerates what is OK and what is not:
Character | Type | OK? |
---|---|---|
\0 |
Control | No |
\a |
Control | No |
\b |
Control | Yes |
\t |
Control | Yes |
\n |
Control | Yes |
\v |
Control | No |
\f |
Control | Yes |
\r |
Control | Yes |
\x1a |
Control | No |
\e or \x1b |
Control | No |
\xHH |
Hex literal | No |
\OO |
Octal literal | No |
\uHHHH |
Unicode literal | No |
It’s rather maddening. Especially when dealing with text corpuses in the large. For example, people on forums type/mistype \u
all the time (e.g. a dyslexic user may type "me\u"
instead of "me/u")
. Ironically, this very post can’t go into Slash GraphQL without further processing.
What is the ideal way?
Jokes aside, strings are hard, man. I think we should follow the GraphQL spec and just accept a sequence of UTF8 characters. That means not expecting the inputs to be escaped. We should let people type whatever they want in strings.