JSON (blob) as a scalar

It would be nice to have a new built in scalar called JSON. Yes, while we could currently store the string representation in a String, having a JSON typed scalar, even though the underlying representation is exactly the same (i.e. still a string), it’d be more “type safe”*.

The difference is when deserializing the response: simply replace the quoted json with the representation.

How It Works (Example)

Given a schema as follows:

type Foo {
    name: String! @id
    bar: JSON
}

Assume we have 2 data with each datum as follows:

Datum 1:

mutation{
    addFoo(input: [{name: "fst", bar: "{\"baz\": 1}"}]) { 
        numUids
   }
}

Datum 2:


mutation{
    addFoo(input: [{name: "snd", bar: "[{\"baz\": 1}, {\"quux\":3.14}, 1, 2, 3, 4]"}]) { 
        numUids
   }
}

Now when we query,

getFoo(name:$Q) {
    name
    bar
}

would return the following:

{
  "data": {
    "getFoo": {
         "name": "fst"
         "bar":  {"baz": 1}
    }
...
}

And the query for the second datum would return:


{
  "data": {
    "getFoo": {
         "name": "snd"
         "bar":  [{"baz": 1}, {"quux":3.14}, 1, 2, 3, 4]
    }
...
}

Observe that when adding to the database, the quoted json is used. But when returned, the unquoted json is used.

* - I put “type safe” in quotes because I cannot prove progress and preservation.

3 Likes

I have a use case for this and already use JSON in Dgraph within a string type.

If this passes, I propose JSON functions as well :wink:

References:

https://dev.mysql.com/doc/refman/8.0/en/json-modification-functions.html

With the likes of :

  • JSON_ARRAY_APPEND(json_doc, path, val[, path, val] ...)
  • JSON_ARRAY_INSERT(json_doc, path, val[, path, val] ...)
  • JSON_REMOVE(json_doc, path[, path] ...)
  • JSON_REPLACE(json_doc, path, val[, path, val] ...)

Dgraph could all of a sudden support both Sets (as it does now with its []) and Arrays (as actual ordered duplicatable items).

I’d be OK if it were postgres-like where the database doesn’t know the schema of the JSON. If we have to keep the schema of the JSON then the order of difficulty goes from “we can do this” to “this may never terminate”

1 Like

If the database had to know the schema of the JSON it would make no sense to have it in the first place. Just use deeper types.

Also,

should be:

addFoo(input: [{name: "fst", bar: "{\"baz\": 1}"}])

escaping input like that in pure GraphQL is a pain. My practice is to use template literals in my JS and send it as a var to GraphQL. The point made though was that the input must be a String.

1 Like

fixed thanks

For context to this conversation, I made myself rules for when I allow myself to use JSON in Dgraph.

  • When I will ALWAYS want the complete JSON at once. If there are any use cases where I only will want a part of the JSON doc, then JSON is not the correct type.
  • When I will ALWAYS update the complete JSON at once. (This is a limitation of just how it is right now. With JSON functions, this point is void.) To update a JSON doc I must read the JSON make the changes client side and return the whole JSON. If the use case has lots of small updates to the JSON doc that does not effect the JSON doc as a whole, then JSON is not the right type.
  • When I will NEVER need to search for information within the JSON doc. (I understand it is possible, but it could very easily lead to false positive. If a JSON doc holds an object with two properties and I want to search within one of those properties only I could not.) If I ever need to data inside of the JSON as searchable text, then JSON is not the right type.

I wrote these rules for myself to prevent future problems that I had in the past with JSON in MySQL. It can be powerful, but it can also have severe limitations if used incorrectly.


Here is a repo for reference that supports a JSON GraphQL, graphql-scalars, and quite a few other types which users could find helpful.

Also if we had a JSON type then GraphQL could support later on features such as @recurse that right now even mapping to a DQL function requires a user to define the GraphQL document for the entire structure that is wanted. It would remove a bunch of error handling as then the sugraph types would not be checked any deeper than the JSON type, but even with this, users may still find it useful and a very valuable feature to merge another feature from DQL into GraphQL.

I’d also like to express demand for a JSON scalar. Basic syntax checking would help a lot.

I disagree, as JSON could become very big and could have high depths. Implementing such large JSONs in GQL has disadvantages:

  • you have to specify everything in your query (you retrieve only what you specify)
  • It surely will have performance impacts on dgraph when the tree is large, no?

So I think that having some kind of JSON schema validation built in could be very useful as it disallows clients to push invalid JSON. Also, queries and mutations don’t have to traverse the graph and would be just a single key-value lookup.

Of course you could just do the validation on your client whenever you query such fields. But this is another thing you need to remember, potentially over a large diversity of clients.

Proposition:
Add a @json directive

directive @json(schema: String) on FIELD_DEFINITION

type Foo {
  complex: String! @json(schema: "file://schema.json") # loads schema from URI. Spec: https://json-schema.org/
}

type Bar {
   comples: String! @json # only checks JSON syntax
}

So the first example would fetch a json-schema (following the specs from https://json-schema.org/) from a URI (from filesystem or remote).

When someone mutates or queries these types, the first example would check the content against the schema while the second example would just check if its valid json syntax.

It would still return as a String in any query (complying with GQL-spec). But clients can be sure that they used/retrieved the correct JSON.