JSON (blob) as a scalar

It would be nice to have a new built in scalar called JSON. Yes, while we could currently store the string representation in a String, having a JSON typed scalar, even though the underlying representation is exactly the same (i.e. still a string), it’d be more “type safe”*.

The difference is when deserializing the response: simply replace the quoted json with the representation.

How It Works (Example)

Given a schema as follows:

type Foo {
    name: String! @id
    bar: JSON
}

Assume we have 2 data with each datum as follows:

Datum 1:

mutation{
    addFoo(input: [{name: "fst", bar: "{\"baz\": 1}"}]) { 
        numUids
   }
}

Datum 2:


mutation{
    addFoo(input: [{name: "snd", bar: "[{\"baz\": 1}, {\"quux\":3.14}, 1, 2, 3, 4]"}]) { 
        numUids
   }
}

Now when we query,

getFoo(name:$Q) {
    name
    bar
}

would return the following:

{
  "data": {
    "getFoo": {
         "name": "fst"
         "bar":  {"baz": 1}
    }
...
}

And the query for the second datum would return:


{
  "data": {
    "getFoo": {
         "name": "snd"
         "bar":  [{"baz": 1}, {"quux":3.14}, 1, 2, 3, 4]
    }
...
}

Observe that when adding to the database, the quoted json is used. But when returned, the unquoted json is used.

* - I put “type safe” in quotes because I cannot prove progress and preservation.

4 Likes

I have a use case for this and already use JSON in Dgraph within a string type.

If this passes, I propose JSON functions as well :wink:

References:

https://dev.mysql.com/doc/refman/8.0/en/json-modification-functions.html

With the likes of :

  • JSON_ARRAY_APPEND(json_doc, path, val[, path, val] ...)
  • JSON_ARRAY_INSERT(json_doc, path, val[, path, val] ...)
  • JSON_REMOVE(json_doc, path[, path] ...)
  • JSON_REPLACE(json_doc, path, val[, path, val] ...)

Dgraph could all of a sudden support both Sets (as it does now with its []) and Arrays (as actual ordered duplicatable items).

I’d be OK if it were postgres-like where the database doesn’t know the schema of the JSON. If we have to keep the schema of the JSON then the order of difficulty goes from “we can do this” to “this may never terminate”

1 Like

If the database had to know the schema of the JSON it would make no sense to have it in the first place. Just use deeper types.

Also,

should be:

addFoo(input: [{name: "fst", bar: "{\"baz\": 1}"}])

escaping input like that in pure GraphQL is a pain. My practice is to use template literals in my JS and send it as a var to GraphQL. The point made though was that the input must be a String.

1 Like

fixed thanks

For context to this conversation, I made myself rules for when I allow myself to use JSON in Dgraph.

  • When I will ALWAYS want the complete JSON at once. If there are any use cases where I only will want a part of the JSON doc, then JSON is not the correct type.
  • When I will ALWAYS update the complete JSON at once. (This is a limitation of just how it is right now. With JSON functions, this point is void.) To update a JSON doc I must read the JSON make the changes client side and return the whole JSON. If the use case has lots of small updates to the JSON doc that does not effect the JSON doc as a whole, then JSON is not the right type.
  • When I will NEVER need to search for information within the JSON doc. (I understand it is possible, but it could very easily lead to false positive. If a JSON doc holds an object with two properties and I want to search within one of those properties only I could not.) If I ever need to data inside of the JSON as searchable text, then JSON is not the right type.

I wrote these rules for myself to prevent future problems that I had in the past with JSON in MySQL. It can be powerful, but it can also have severe limitations if used incorrectly.


Here is a repo for reference that supports a JSON GraphQL, graphql-scalars, and quite a few other types which users could find helpful.

Also if we had a JSON type then GraphQL could support later on features such as @recurse that right now even mapping to a DQL function requires a user to define the GraphQL document for the entire structure that is wanted. It would remove a bunch of error handling as then the sugraph types would not be checked any deeper than the JSON type, but even with this, users may still find it useful and a very valuable feature to merge another feature from DQL into GraphQL.

2 Likes

I’d also like to express demand for a JSON scalar. Basic syntax checking would help a lot.

I disagree, as JSON could become very big and could have high depths. Implementing such large JSONs in GQL has disadvantages:

  • you have to specify everything in your query (you retrieve only what you specify)
  • It surely will have performance impacts on dgraph when the tree is large, no?

So I think that having some kind of JSON schema validation built in could be very useful as it disallows clients to push invalid JSON. Also, queries and mutations don’t have to traverse the graph and would be just a single key-value lookup.

Of course you could just do the validation on your client whenever you query such fields. But this is another thing you need to remember, potentially over a large diversity of clients.

Proposition:
Add a @json directive

directive @json(schema: String) on FIELD_DEFINITION

type Foo {
  complex: String! @json(schema: "file://schema.json") # loads schema from URI. Spec: https://json-schema.org/
}

type Bar {
   comples: String! @json # only checks JSON syntax
}

So the first example would fetch a json-schema (following the specs from https://json-schema.org/) from a URI (from filesystem or remote).

When someone mutates or queries these types, the first example would check the content against the schema while the second example would just check if its valid json syntax.

It would still return as a String in any query (complying with GQL-spec). But clients can be sure that they used/retrieved the correct JSON.

In GraphQL, the return type for the value of bar must be whatever is specified in the schema. Now, returning an arbitrary map is not possible because GraphQL doesn’t have a map type. And we can’t infer the exact type to be returned without seeing some data for it first. Hence, all that we can return here is a string representation of the JSON.

This is the only advantage I see over using a String type until now. I am not sure if this is an important enough feature as this can be done on the client side right now.

2 Likes

The advantage would be that posting invalid json would be impossible. E.g. from 3rd party clients that have bugs?
This means that the database has always correct serialized objects, we don’t collect garbage data and don’t need to write code to handle this garbage data when being queried.

2 Likes

I now disagree with myself too, lol.

Working from the json directive idea,

directive @json on FIELD_DEFINITION

In implementation, any field definition that has this directive is stored as a string, and requires a string input. In a perfect world, Dgraph would parse the string and check that not only it is JSON but that it adheres to the type on which the directive is applied.

type Foo {
  id: ID
  bars: [String] @json # this is a JSON string in the database. The input is a string, but output returns an array of strings or null if empty string or nil
  xyzzy: Baz! @json # this is stored as a JSON string in the database. The input requires a string, but output returns the type Baz defined below as a remote type.
  thud: Waldo @json # this is stored as a JSON string in the database. The input is a strong, but the output returns the type Waldo defined below as a remote type or null on empty string, nil
}

type Baz @remote { # type used to parse JSON for input checking and output
  qux: String
  quux: [String!]!
  quuz: Int
  corge: Float
  grault: DateTime
  garply: [Waldo]
}

type Waldo { # dual purpose type
  # Any @json field can use any other existing defined type for JSON input checking and output parsing
  # Since there is no @remote directive here, this will also be used as a regular stored type as well 
  # generating queries, mutations, and inputs. Note that querying this type would only get the types 
  # that are added through mutating this type and not added in via the Foo.thud as that stores a string 
  # and does not create a related node.
  id: ID
  fred: Point
  plugh: Baz
}
query {
  addFoo(input: [{
    bars: "[\"bar1\",\"bar2\"]"
    xyzzy: "{\"qux\":\"flux\",\"quux\":[\"core\",\"capacitor\"]}"
  }]) {
    foo {
      id
      bars
      xyzzy {
        qux
        quux
      }
    }
  }
}

Also on the implementation, some users may want to digest the JSON string in specific pieces like normal traversing, but instead following edges, it is parsing a string. Other users may want to retrieve the JSON string at once, and that could be supported with two fields in the generated type, one being the string unparsed, and the other being the parsed JSON.

data: {
  addFoo: {
    foo: [
      {
        id: "0xa",
        _bars: "[\"bar1\",\"bar2\"]",
        bars: ["bar1", "bar2"],
        _xyzzy: "{\"qux\":\"flux\",\"quux\":[\"core\",\"capacitor\"]}",
        xyzzy: {
          qux: "flux",
          quux: ["core", "capacitor"]
        }
      }
    ]
  }
}

This would also provide an official Dgraph work around to the following:


Problems to think through, How or if search would be available on these @json fields. I would opt to go for no @json @search on the first iteration just to get this concept out there and usable and then figure out if and how to allow searching through the JSON strings later.

I currently store JSON in a string predicate for a class of data I will always need together in my application, it would be nice to have some sort of support for this, yes. (not looking for indexing or anything).

Do note that some use this database as a graph database and not a GraphQL©™ database, so implementing in the actual database scalar types would be nice.

2 Likes

For me the String type is doing enough for me JSON blob wise at the moment.

The rules above from @amaster507 are good and I just check JSON validity before putting it in the DB.

The only thing I use this for currently is ‘user preferences’ and what ‘tutorials/onboarding guides’ people have completed.

DQL backend golang server user (for more context)