[Discussion] Improving the GraphQL Implementation for it to be useful in real production apps

Hi DGraph Community!
I’ve spent the past couple of weeks experimenting with DGraph and its GraphQL implementation.
I absolutely love DGraph and DQL, but not so much its GraphQL endpoint.

Here are some considerations by someone that have built countless GraphQL servers.

The idea of having an automatic GraphQL endpoint is nice, but practically with the current implementation, it is just useful for “PETS” projects or prototyping not for starting up a “real” project.

Here is why:

  1. Designing the Contract: When starting a new project or startup idea, you want to be able to iterate fast. However, you also want to have control over the schema and database design as much as possible, so that the project can evolve as the idea and business grows.

    Designing a good GraphQL schema at the start brings huge benefits, especially, since the clients are built around it. You can start having DGraph directly serving the data behind the GraphQL server, but at some point, you want to be able to transparently replace DGraph and position it behind a Microservice or at least be prepared.

    Having the queries and mutations automatically generated constraints us to develop our clients and future microservices over a convention that we might not like and we cannot change.

    The inputs are also super important. I want to be able to expose a minimal set of params in my GraphQL inputs, perform the filtering logic on the backend. At the moment DGQL is generating very complex and powerful inputs that allow querying almost the entire database from the UI.

    I feel very uneasy deploying to production such a powerful discoverability.
    Touching again on the design aspect, I’m very much constrained with the way DGraph is designing my inputs, so now if I ever want to have a GraphQL schema that is simpler by nature i can’t achieve that.

  2. GraphQL is the Database and the Database is GraphQL: I think this is a wrong approach.
    I believe that the API Layer should be decoupled as much as possible from the database. There could be similarities between the 2, but they shouldn’t mutate one another.

    If i change my API contract i shouldn’t be worried that my database will also change and vice versa.
    What i do think is right is, to have a mapping of some sort that tells the 2 contracts how they relate to each other. It could be conventional or coded in.

Example:

## GraphQL Schema

type User {
  id: ID!
  fullname: String
  age: Int
}

## DGraph Schema

User.id: uid
User.name: string
User.age: int

type User {
  id
  name
  surname
  age
}
  1. The 2 types are almost identical except for the name & surname field in the DGraph schema over the fullname in the GraphQL schema. We might be able to map 80% of the fields automatically but the one we can’t the user can write a little piece of code (in the resolver or not) to tell how that is computed.

    The main advantage of this is that we can evolve independently the API and the database we have full control over those 2.

    The disadvantage is that we now have 2 schemas to maintain (which make a lot of sense to me).

  2. I want a powerful GraphQL server but I don’t have control over it: This is what is currently happening with Draph GraphQL implementation.

    There is currently too much declarative logic in the wrong place, (the schema). For example, i can do http requests directly from the schema OMG, authentication, authorizations, custom DQL queries etc…

    I think in a medium sized project using that approach will just end up being too complex to even understand what’s going on. (hence very good for pets projects)

    What about:

  • Input validations
  • Input sanitisation
  • Input aggregation
  • transformation
  • etc… etc…

    I believe all of the above including authentication, authorization has to be done outside the schema definition. Preferably at the application level.

    Ok, we have Lambda resolvers for this. But HEY! I want to be able to use Go, Rust, Java, PHP, Haskel, etc… Why do i need to deploy another system which I don’t have control of? It’s also Javascript!! (BTW i love JS too, but i’m just being THE GUY now.)

    My point is that since I would need lambda resolvers anyway for all of my resolvers as I’m doing the (validatitons, sanitisation, etc…) at the application level I could, at last, have control of the runtime at this point (if I’d feel like).

Summary

To summarise as a developer that wants to build a future startup or personal project i want:

  • Use a technology that allows me to design my contracts A-Z
  • Use a technology that doesn’t get into my way once I my idea grows
  • I want to be able to evolve the system, not rewrite it
  • I want to iterate fast

Alright, I think those are the main problems that i’m seeing with the current GraphQL implementation, but please don’t think that DGraph is the only one guilty of the points above.

Hasura, Postgraphite, and a million other automatic GraphQL server generators suffer the same problem.

But since I really felt in love with DGraph i’m just passing by and trying to see if you guys are happy to be a little different and try to do this right.

Alright fenos how would you do it?

I don’t have the right answer ready for you, but I have some ideas which they can be right or wrong but at least they can start breaking the ice.

Let me start by saying that I understand the technical problem behind building an automatic GraphQL server or query planner. Less flexible the schema is the easier it becomes.

However, if we stick with trying to accomplish our goals mentioned above, we can take 2 routes:

  • Create a library that parses GraphQL query at runtime and generate DQL query plans to send directly to DGraph

  • Make DGraph GraphQL implementation super flexible, which will allow me to develop my database and graphql api independently, as well as hooking it up into my own server so that i can accomplish validation / sanitisation / authz etc… in my own runtime and just proxing the query / mutation operation to DGQL.

A library

Pros: Having a library which does the hard work gives us the most flexibility over all

  • Can be used in our own runtime
  • Can be easily extended / configured
  • Schema less cluttered
  • Fully decoupled from the database

Cons:

  • A million languages to implement it in.
  • Query planner can get tricky

A smarter DGraph GraphQL

This could be the “kill 2 birds with one stone approach.”

What we need to change:

  1. I can define my own inputs - return types
  2. Decouple the GraphQL schema from the Database Schema

Point 2 is fairly simple. Just store the 2 schemas independently.

Point 1 instead is the tricky bit. What we can do here is:

  • Tell DGraph the known mappings between the schemas, so that it knows what to select.
  • Once we receive a query that looks like this:
query Authors {

    bestSellingAuthors(since: "2021-10-20") {
      id
      name
   }

   authors(filter: BEST_SELLING) {
      id
      name
   }
}

We will send this pseudo mapping along with the original graphql query:

map[string]string{
  "bestSellingAuthors": "@filter (lt(since, $since))",
  "authors": "if $filter == BEST_SELLING $mapping.bestSellingAuthors($since)"
}

then it should be able to resolve the queries.

How i’m imagining in pseudo-code the whole implementation (using my runtime) is something like that:

app := dgraph.NewGraphQLServer()
app.GraphQLSchema("../*.graphql")
app.DGraphSchema("../*.dgraphql")

app.Resolver("bestSellingAuthors", dgraph.Resolver{
   OnQuery: func(builder, args) {
        return builder.Filter("bestSellingAuthors").Eq(args["since"])
   },
   Resolve: func(_, args, ctx, info, next) {
     // custom logic validate, authorize, authenticate, etc...
     resultFromDGraph := next(args, ctx, info)

     // return or transform
    return resultFromDGraph
   }
})

app.Run(8000)

Now we should be able to have everything a developer wants (if the above code would work eheh).

  • I Can evolve my own database / schema
  • I Can customize my own queries
  • I Can implement my business logic in my own runtime
  • I let DGraphQL do the hard thinking of using my modifications (if any) to accomplish an optimised query
  • We could also optionally opt in for the conventional inputs so that DGraph GraphQL works as it would now.
  • If a resolver is not implemented (like above), it will try its best to return data (using the mapping between the 2 schemas or conventional mapping) ignoring the inputs if not providing conventional ones.
  • Iterate super fast and ready for the future.

I have few other ideas but i believe that this is already a lot to digest.

Please team DGraph and Community let me know what you think about my point, I’m very open to anything you want to add / improve / criticise.

Note: It will take me a few edits to get the formatting right eheh

Regards

1 Like

interesting, I am invested and have a lot of feedback and inputs that I don’t have time right now to delve into. I look forward to picking this back up. This goes along the lines of input santization @maaft @codinghusi but not sure if this is the right way to go about handling it, just another idea to throw into the bucket though.

@fenos what you are describing is the approach that neo4j has taken which does have tradeoffs for sure. But Dgraph overall is a much better product than neo4j IMHO. If you wanted to have your own GraphQL layer on top of DQL endpoints there is nothing stopping you from doing so. You could even rewrite queries in your layer to DQL or even GraphQL and send them on their way to Dgraph’s endpoints. No one is stopping you from doing this.

I agree that DGraph is a better product than Neo4j, no doubt.

However saying that i could rewrite my DQL queries in my GraphgQL layer is true withouth a doubt.
But now i’m not iterating fast any more.

Also i’ll need to write something very clever that parses the GQL AST and generate the right query and things can get very tricky.

If all of the above could be provided by DGraph out of the box, it will be a no brainer for a developer / company that want to build a product.

It can just start a server with 1 liner and extend as needed

I haven’t tried it, but what about using something like Schema Wrapping?

Schema wrapping (@graphql-tools/wrap) creates a modified version of a schema that proxies, or “wraps”, the original unmodified schema. This technique is particularly useful when the original schema cannot be changed, such as with remote schemas.

Schema wrapping works by creating a new “gateway” schema that simply delegates all operations to the original subschema. A series of transforms are applied that may modify the shape of the gateway schema and all proxied operations; these operational transforms may modify an operation prior to delegation, or modify the subschema result prior to its return.

Yhea, i thought about that as well!
However it’s a big pain to map everything at the detail with the current @graphql-tools/wrap api.

Maybe a better API which makes it more transparent or more fluent could be the solution to achieve where i’m heading at.

Surely something i’ll explore a bit more.

While it’ll take me more time to digest the entirety of this post, my knee-jerk reaction is that I could not disagree with you more. :upside_down_face:

The whole point of using GraphQL is that its well-defined query structure maps precisely to an arbitrary graph space. In some sense you can think of a GraphQL query as merely the ‘simplest possible semantic representation’ of a traversal.

Why would you possibly demand an intermediate layer when the direct mapping can easily cover the data space? Most queries, even in complex apps, can be described by a single graph traversal. The queries that cannot are either outliers beyond my imagination, bad schema design that fails to appropriately model relationships, or filters that can be projected into a single query with slightly more ‘definition’ or ‘annotation’ on that semantic form.

Practically everything you suggest has outlier value and can already be engineered on top of Dgraph just as easily regardless of Dgraph hosting the minimal schema api. Of course you should be able to build an intermediate graph layer involving more business logic if you so choose…and there are totally valid use cases for things like validation, if you can’t rely on well designed data models or schemas. The whole point is that the Dgraph API ‘minimally covers’ the schema. But it’s added code, complexity, and ‘language,’ to traverse a system that is largely unnecessary for users with sound data structures (ie, 1:1 mapping between schema and data model).

In most cases, a modern client can simply be imagined as a portal for viewing and operating on a graph context. I don’t need any of the things you describe separated from Dgraph for my architecture that I’m confident can model practically anything I want to put on a graph and scale cross-platform to billions (Flutter + Bloc + Dgraph). Rather, you double my workload by forcing the separation, and drastically increase the business surface for errors.

Further, the things you advocate for separating have incremental value when built on top of GraphQL semantics. If you’re claiming that you can give me the same @auth flexibility without defining a duplicate semantic form that maps precisely to the schema…I just don’t believe you. Having the annotation in-schema gives me the confidence to deploy a single secure context that can be accessed by different apps with vastly different auth needs.

So I feel like this is separation-of-concerns gone wrong. I think a lot of the practical argument depends on imagining failure cases that Dgraph already solves for. And as a result, advocating for a migration of logic to the shoulders of the developer makes ‘engineering’ sense by following some arbitrary heuristics…but in this case I don’t think it leads to more elegant or more powerful systems. Dgraph is about user experience and maximized leverage, just as it should be.

If you can come up with some sound example schemas and views that you think require more sophisticated logical separation between the query and data layers, I’d love to give them some thought.

Hi @CosmicPangolin1 thanks a lot for putting your thoughts together.

However i think that there is a miss understanding here, I’m not criticizing the DGraph approach of using the GraphQL language. But what i’m criticising is the /graphql endpoint. Especially the fact that the auto-generated GraphQL compliant server doesn’t provide enough flexibility to model my own inputs / outputs and custom operations against it.

The main point is that having a client directly consuming the GraphQL API exposed by DGraph it’s just not realistically doable (see the points i mention above) if not for prototyping or for a pet project

Practically everything you suggest has outlier value and can already be engineered on top of Dgraph just as easily regardless of Dgraph hosting the minimal schema api. Of course you should be able to build an intermediate graph layer involving more business logic if you so choose

Yes, that’s completely right! Everything can be already engineered on top of DGraph.
The essence of this post is trying to understand how to best achieve this intermediate graph layer involving more business logic and maybe having this approach embedded into DGraph itself.

However i can guarantee, that you always want to have more business logic especially for validation / sanitisation, storing derivate fields etc…

If i’m go plain GraphQL+ Dgraph I would lose a lot of the automatic query plans i get with the current implementation.

I don’t disagree to the fact that the GQL schema and DGraph Schema is almost certainly a 1:1 mapping of types. I’m ok with it. The main point is to provide a way of customising the CRUD operation and add business logic around it.

On a side note, when the mapping defer you can see that you are provided with a @dgraph() directive to change that mapping.

Further, the things you advocate for separating have incremental value when built on top of GraphQL semantics. If you’re claiming that you can give me the same @auth flexibility without defining a duplicate semantic form that maps precisely to the schema…I just don’t believe you. Having the annotation in-schema gives me the confidence to deploy a single secure context that can be accessed by different apps with vastly different auth needs.

Having a @auth directive in the schema it’s not wrong per se, what is wrong is that is too opinionated.
People could argue that a @validate directive is also doable. Some people prefer doing it in code some people prefer to make it more declarative.

Image I want to use Auth0 way of authenticating and authorizing? or my custom OAuth2 (Open Connect) server deployed in my cluster?

The flexibility comes when the developer is able to chose what’s best for him on his own circumstances. I should be able to implement my own @auth directive the way I want it to work for instance.

With the current implementation i’m forced to use the @auth directive if i want auth. Probably still boils down to the fact the we need a nice way to build a surrounding server of the DGraph GraphQL endpoint or make that endpoint smarter / more flexible.

So I feel like this is separation-of-concerns gone wrong. I think a lot of the practical argument depends on imagining failure cases that Dgraph already solves for. And as a result, advocating for a migration of logic to the shoulders of the developer makes ‘engineering’ sense by following some arbitrary heuristics…but in this case I don’t think it leads to more elegant or more powerful systems. Dgraph is about user experience and maximized leverage, just as it should be.

After a few thoughts i agree with you here. The way i thought it was working initially is that, if I were going to change a field in my DGraph schema, my GraphQL schema would also changes. But apparently I can map an existing DGraph schema to GraphQL and only the GraphQL could mutate my schema if i’m not careful enough. Also it provides the @custom directive where i don’t want the schema to be mutated.

So the separation of concern in this specific case it’s not strictly necessary. It just adds complexity for little gains as you mentioned.

I also disagree about @auth being opinionated - I think it is more of a minimal form for defining authorization with nested graph granularity. You can pretty easily use a lambda to enforce higher level auth logic, but you absolutely require something that has @auth’s graph semantics for maximum specificity. It must exist to build secure graph systems.

I think for the rest, the trouble is in thinking about business logic as a sort of ‘query for data and safely merge/sort away from the client to build interesting views’ type of controller. When you use a graph system with granular authorization, you can imagine safely pushing a lot of query specificity to the client, or validation logic to a thin lambda layer. I even do this for a ‘user achievements’ feature. Like…with JWT claims an OAuth client can very easily be prohibited from mutating data beyond a few well-defined relationships. It’s a pretty serious re-thinking of the client/server role, but it’s enabled by the semantics that define authorization…and with the drastically limited attack surface it often cuts out the need for an intermediate server altogether.

Not to say that you don’t want to use servers to intermediate sensitive business logic, perhaps involving complex graph operations or operations on a highly restricted subsets of data…but I don’t see why that’s made harder by the existence of a minimal query api. It feels like I just have solid building blocks. Again, I think most of these Dgraph semantics are just ‘what must exist to make the thing work.’

I believe you are noticing what I noticed is the number one problem with Dgraph… a reasonable, scalable approach to backend security (this includes what is going on in the schema).

Here is one post I am eager to see the DGraph team expand upon:

While I disagree with you on one thing, I think your end result is in agreement with mine.

  • The schema needs to be separate from @auth @custom @lambdas - this makes it incredibly difficult to maintain any large projects that scale… there is too much going on (and it doesn’t even pretty print)
  • Input Validations are a must (see the link above) - a simple fix for now would be a pre-hook which would allow whatever you want… I also love the idea of something like firestore rules in the future, as it does basic boolean checks without slowing down the query, and is very very powerful
  • Field Validation Rules - along with field validation comes a way to BLOCK certain fields from being added, changed, etc depending on certain conditions (without creating a new mutation or type)… this is really a caveat to the later

Which leads me to why I completely disagree with you on spearating the DQL and Graphql (as far as the way graphql is now)… It will be ERROR PRONE. Why mess with that on a lower level like in your example when you could achieve the same desired result on the upper level (in this theoretical future would of dgraph). The same result could be achieved by locking anyone (except admins if that is what your rules say) from posting to createdAt, for example, and adding that functionality in a pre-hook with a server timesteamp. If you’re an admin, you could, however, change it. A simple validation that locks that field would solve your problem.

I don’t see any reason to modify the way GraphQL connects to DQL to accomplish your needs. I also agree with you that other languages should be added to things like lambdas, post-hooks, and pre-hooks (python, php, probably not perl, RIP). However, this would only be available when you don’t use Cloud Dgraph, and when the Dgraph team has time to write packages for them (a working slash-dgraph-cli is a must first, for example!)…

Simpler is better. I, like yourself, love Dgraph. I forsee this as the number one problem with Dgraph.

A lot of other issues are already in the works, but I believe they have not decided how to handle this problem quite yet… do a quick search on the form and you will see many related posts.

J

1 Like