Go + GraphQL + Dgraph = Demo!

example

(Roman Sharkov) #1

Hey guys, I’m proud to announce the first release of the Go + GraphQL + Dgraph demo! :champagne::tada:

v1.0.0 demonstrates just basic features (resolving GraphQL queries against Dgraph (naive resolvers, optimizations are WiP), performing mutating transactions, deleting nodes) but I’m planning on improving it further showcasing and testing more Dgraph features such as geo and full-text search soon! :rocket:

Documentation is currently almost non-existent, but I’ll fix this as soon as I can!

I’d love to hear what you guys think about it, any feedback, PRs and issues are highly welcome! :heart:


(Francesc Campoy) #2

Hi there, Roman

Thanks for the demo, it looks great!
I’ll have a look tomorrow and send some feedback/PRs.


(Joschka Tillmanns) #3

Couple of questions:

  • You seem to resolve each type individually, wouldn’t it be more easy to simply translate between GraphQL and GraphQL± ?
  • Is this doing a request against dgraph for each node in your graphql request tree
  • If request / node is true, how did you solve the n+1 issue?
  • If you did solve the n+1 issue, how did you solve it on request nodes that require filtering? (what you would typically solve with @filter() using GraphQL±)

Thank you for your effort on the demo and I am excited about your answers!


(Roman Sharkov) #4

You seem to resolve each type individually, wouldn’t it be more easy to simply translate between GraphQL and GraphQL± ?

Easier? no, I don’t think so. I’d need to write a whole GraphQL parser & execution engine to be able to translate GQL directly to GQL± which would also allow me to do authorization, validation, etc. It’s an interesting question though which I’ve asked before writing this tech-demo, yet I couldn’t find an answer.

Maybe you’ve got a simpler approach in mind that I’ve missed out?


Is this doing a request against dgraph for each node in your graphql request tree

Not exactly (leaf data nodes are fetched together), but yes. The resolvers are implemented naively so far because there’s no batching and caching involved yet. v1.0 was intended to just work but optimizations are necessary to make it production-ready.

query {
  users { // -> 1 roundtrip
    id
    displayName
    posts { // -> u roundtrips
      id
      title
      contents
      reactions { // -> r roundtrips
        id
        emotion
        message
        author { // -> r roundtrip
          id
          displayName
        }
      }
    }
  }
}

This query would, indeed, result in 1+u+(r*2) database roundtrips where u is the total number of users and r is the number of relevant reactions. Assuming there are 100 users, each having 100 posts each having 100 reactions this single query would invoke 1+100+100*100*2 = 20101 database roundtrips, which, of course, is terribly inefficient!

There are 4 things we can/should do:

  • Allow only whitelisted, safe queries to be executed by clients.
  • Implement pagination such that requesting the entire lists (Query.users, User.posts, Post.reactions) isn’t allowed.
  • Introduce batching.
  • Introduce caching.

Batching & caching will cut down the number of requests significantly. Take the User.posts resolver for example, instead of performing an actual database request right in the resolver we should request it from a loader:

// Posts resolves User.posts
func (rsv *User) Posts(
	ctx context.Context,
) ([]*Post, error) {
	posts, err := rsv.root.loader.UserPosts(ctx, rsv.uid)
	if err != nil {
		return nil, err
	}
	if len(posts) < 1 {
		return nil, nil
	}

	resolvers := make([]*Post, len(posts))
	for i, post := range posts {
		resolvers[i] = &Post{
			root:      rsv.root,
			uid:       post.UID,
			id:        post.ID,
			creation:  post.Creation,
			title:     post.Title,
			contents:  post.Contents,
			authorUID: rsv.uid,
		}
	}
	return resolvers, nil
}

The loader can then do batching & caching internally.

Batching would accumulate the uids and fire a single batched query against the database when either the batch size or the time out (~5-10ms) is reached. With batching enabled, the number of database roundtrips in the example above would be reduced to only 4!

Caching would eliminate redundant requests. Why load all posts of the user x if we already did it before and have them in the cache? Caching, however, comes with some problems. It works nicely on a single machine but when we start scaling out horizontally to more than 1 API servers we’ll end up having a distributed cache invalidation problem when a user creates/removes a post. It can be addressed this way:

  • TTL (time-to-live) = availability
  • Internal event broadcasting = consistency (but then there also are net splits we have to deal with)
  • Hybrid (TTL + event broadcasting)

If request / node is true, how did you solve the n+1 issue?

I didn’t, yet :smile:. The n+1 problem is solved using the data loaders described above.


If you did solve the n+1 issue, how did you solve it on request nodes that require filtering? (what you would typically solve with @filter() using GraphQL±)

It depends on what filter you mean. For every static filter there’d be a loader. Let’s say we have two different lists:

type User {
  # all published posts
  posts: [Post!]!

  # all archived posts
  archivedPosts: [Post!]!
}

Those nodes would need to be resolved by separate loaders:

  • User.posts -> loader.UserPosts(userUID)
  • User.archivedPosts -> loader.UserArchivedPosts(userUID).

And this query would result in 3 database roundtrips

query {
  user(id: "x") { // -> 1 roundtrip
    posts { // -> 1 roundtrip
      id
    }
    archivedPosts { // -> 1 roundtrip
      id
    }
  }
}

But I doubt you can do the same with dynamic filters like these:

type User {
  # allows arbitrary user-defined queries "-reactions:>5 -creation:yesterday"
  posts(query: String!): [Post!]!

  # allows to look for posts with a similar title
  postsWhereTitleLike(title: String!): [Post!]!
}

With dynamic filters based on arguments batching is out of question I suppose, you could cache the results though.


(Michael Compton) #5

Wow, nice example of GraphQL+Go+Dgraph.

We are currently starting to build features into Dgraph to make GraphQL a core capability of Dgraph. That’s on our roadmap for the year and we are starting to build out the features now, so watch out for those coming up.

The points above highlight some of the challenges with GraphQL and indeed with GraphQL on a graph database. The underlying technology should be giving more support because Dgraph itself can remove those GraphQL challenges around dataloading, under/over fetching and N+1 problems, etc. You should be able to ask Dgraph more directly for what you want, and not have to write a complete system of resolvers for every piece and have to solve those challenges yourself.

As an example, in the past I (outside of Dgraph) built this. Where you provide a GraphQL schema and it runs a GraphQL API on Dgraph, which accepts GraphQL queries, validates against the schema and automatically translates incoming GraphQL queries into Dgraph and then back out as GraphQL results (@romshark it’s very much the kind of automatic translation tool you were wondering about). It has a way of supporting some of the more complex filters that @Joschka and @romshark are discussing. So you can request a author and their posts, or use a Dgraph style filter in the GraphQL to query an author but filter the posts to those that mention “GraphQL”, or filter, order and paginate.

Our aim is to make Dgraph assist you in writing GraphQL APIs. So for example, with features similar to what I pointed to above, rather than implementing and testing everything, you’ll be able to push chunks of GraphQL in to Dgraph (with filters, search, paging etc.) and Dgraph helps resolve the GraphQL.

That way, you can focus on your app logic around the graph, and Dgraph will assist as much as it can in resolving the GraphQL queries. Once we have the initial layers up, it also opens Dgraph up to a whole world of GraphQL tooling including things like GraphQL schema stitching and delegation, so you’ll be open to, for example, use something like Apollo to create federated APIs where a Dgraph component is part of that … fun times :slight_smile:

That’s where we are heading. I thought it was really relevant to discuss here because the discussion is exactly around the kinds of issues we are trying to help with.


(Roman Sharkov) #6

Just posted v1.3.0 which introduces “GraphQL Shield”, an embedded GraphQL query whitelisting middleware that I’ve been writing for the past few days (I’ll probably move it to its own separate repository).

The shield prevents non-whitelisted queries from being executed and it doesn’t even parse the GraphQL query which makes it very fast! It’s based on a radix-tree index and normalizes incoming queries such that the query is white-space agnostic, any of the following queries will be the same:

query {
  users {
    id
  }
}
query { users { id } }
query\n{\n\tusers {\n\t\tid\n}\n}\n

Arguments are also checked against the expected parameters (simply by max-length).

I’ve quickly benchmarked it and it’s able to process and reject ~2kk non-whitelisted queries per second (It was a simplified benchmark, I’ll have to test it further).

It does have a few drawbacks too. Since it’s not based on the GraphQL AST it’s very strict about things like spaces between a node identifier and a block opening character, but that’s the trade-off (performance for convenience).

The GraphQL Shield supports JSON-file persistency allowing you to configure it ahead of time. It also supports adding/removing queries dynamically at runtime (I’ll probably even add a very simple password protected admin dashboard for whitelist management). Persistency is done through the PersistencyManager interface which you can implement however you like (if you want to store the query whitelist in Dgraph for example).

P.S.
There are many ways of protecting a GraphQL API from malicious queries but whitelisting is probably the easiest and fastest way. Query cost analysis is way more difficult while max-query-depth is leaky (what if you grow the query “horizontally” when approaching the depth limit?). Whitelisting is relatively easy, predictable and fast.


(system) closed #7

This topic was automatically closed 41 days after the last reply. New replies are no longer allowed.