Supporting type and schema in Dgraph through GraphQL

Hey guys,

Here are the functionalities we are trying to implement:

Feature List

  • Have a type system for the kind of datatypes supported by Dgraph (based primarily on Graphql which we can extend in future)
  • Have a schema specification for defining those types in a file which is initialized on application start (in future, think about dynamically adding/modifying to schema)
  • Have a mechanism for clients to upload schema for their data
  • Parser for the schema file to identify/verify client types
  • Type verification while querying the database
  • Coercion of datatypes if required
  • Verification/coercion of datatypes obtained in query response
  • Modify protobuf response to conform to appropriate data types according to schema

This is the general outline. Please feel free to ask any questions or give suggestions on this.
(Edited based on suggestion)

4 Likes

This makes sense. In addition, we’d also want to modify our client-facing server code, so it can generate appropriate data types in JSON response, or protobuf response.

I was looking at GraphiQL ( GitHub - graphql/graphiql: GraphiQL & the GraphQL LSP Reference Ecosystem for building browser & IDE tools. ) and it requires a GraphQL compatible server to work with. What that means is explained in simple terms here (Needs a schema, type system to be specified for queries, mutations etc).

Facebook has GraphQL library in JS ( GitHub - graphql/graphql-js: A reference implementation of GraphQL for JavaScript ) but there are other projects on supporting this in go ( GitHub - graphql-go/graphql: An implementation of GraphQL for Go / Golang ). It’d be great if @akhiltak can have a look at this library and see if this would fit the needs of what we want to achieve here (specifying schemas, type systems, etc) which would also move us closer to supporting GraphiQL.

I had a look at graphql-go when I was started to think about GraphQL. It can do all the things you mention, but it handles the execution of the queries and the intermediate steps for you. This doesn’t achieve the minimize the network calls principle and is a slower approach. Which is why I ended up writing the lexer/parser.

For graphiql, can’t it operate for simpler queries – the kind that we can support today? Not all GraphQL queries have types and fragments.

Based on my readings about GiQl, the main advantages of it are its ability to show the overall schema (Automatic documentation on what predicates each node type has), type checking in query, autocompletion suggestions, fill leaves functionality, schema introspection (which lets us get meta-information like the type and description for different predicates through queries). All these would ideally be supported by a GraphQL server (like the graphql-go) but we don’t support these as of yet. So I don’t think we can integrate GraphiQL with our current server or even if we could, it wouldn’t support any of the features that it was actually designed for and would just act as an editing interface.

It would be lean, but that is the point of the bug that was filed, and the comment that is present in our roadmap issue. Even if we don’t currently support all the possible GraphQL features, this would ensure that we’re compatible with whatever we have implemented.

I went through graphql-go and graphql-js as part of my read-up for type system implementation. It does do a lot of things that we are trying to do but as Manish said, it handles the rest of execution as well which could restrict us in future (if we want to deviate in some implementation). Even in the type system, Graphql-js implementation is much cleaner.

As for Graphiql, in this talk (YouTube), it’s easily being setup with minimal schema. As soon as we have a basic schema and type system up and running, I think we should be good to go (although, we might have to fix a few things here and there to adhere to specifications).

I will work on getting at least the basic schema and types out asap so that you can start tinkering with it using Graphiql.

2 Likes

Also, I think Graphiql could become an excellent testing tool not only for the type system but for fragments, arguements, etc. Basically, for most of the GraphQL specifications, we could test if it works in hand with our implementation or not.

1 Like

Agree that Graphiql would make a good testing set up for us to ensure compatibility with GraphQL.

Regarding type coercion through provided schema:

  • A big win for us would be to have type conversions on query responses, so that we can send correctly typed values to clients.

  • Along with this, the second step would be to include type verification and coercion at mutation level so that integer types don’t have strings stored in them (e.g.: age: “asdf”) (@mrjn please correct me if this is not what you meant by adding type coercion in mutations in our last discussion)

  • While trying to understand code around this, I ran across flatbuffers. It took some time for me to understand what’s going under the hood and got the general idea but didn’t want to spend more time on it as it’s inner workings might not be very relevant to this implementation right now.

  • But I had a doubt about how flatbuffers interpret strings and int32:

  • If a certain float64 value is stored as it is done now (interpreted as a string and stored in a flatbuffer table’s byte slice

  • While fetching that value back, if it is interpreted as integer, will there be an issue or will we get the same float value back.

  • Basically, for any data type, if the type interpretations while storing and fetching are different, should it be causing an issue?

  • My understanding is that we should have an issue because of different allocation size (due to which the offsets in vtable would be incorrect) but then again, I haven’t delved deep into the implementation, so wanted to confirm.

We don’t store values as strings. We store them as bytes. We should continue to do so. The values would be reinterpreted at ToJSON or ToPB function, based on the schema.

So, there’s a clear disjoint between storage and schema. Strorage considers everything as bytes, schema is what converts them to types.

I apologise for not being very clear. I didn’t mean to indicate that we store them as strings. That would defeat the purpose of having FlatBuffers(FB) and incur additional cost in storing/fetching between memory/storage.

FB represents data in a similar format as storage does and that’s why they are a good choice here to reduce data serialisation time (that’s why I explained it further in my question, I just meant it figuratively that we interpret them as strings before moving to FB since there is no type associated with them).

The storage is done by converting everything to byte slice using FB and then pushing them through DataStore (Store struct which has rocksdb details).

So, in our process to convert data to bytes using appropriate offsets in FB → store them as bytes → retrieve them and reconvert bytes to data using offsets in FB. I was just wondering will it lead to inconsistency if we just have type inference of one side of this process.

But I understand I’ll get answer to this question whenever I get into tinkering flatbuffers. For now, will move forward with abstract understanding of the process.

Thanks for bearing up with me. The understanding will improve as I get more acquainted with different modules.

By value, I meant the value in a Posting.
https://github.com/dgraph-io/dgraph/blob/master/posting/types.fbs#L5

Has there been any progress on the type system?

At the moment it seems that schema applies globally to all nodes, so effectively there is one global type?

What would be really nice, is optional types. So you can specify schema/types for nodes (and sub-nodes) if you want them, but use a default Node type (which would work in a similar way to the current global Dgraph schema works).

Reasons why types are useful:

  1. You can have the same field name represent different types of values, depending on the node type. E.g. title of a Post is different to a title for a User.

  2. You could request a list of a specific type, without having an ID (so give me first 10 nodes of type User), exploring data would then become easier too (as you wouldn’t have to always start with an ID/filter)

  3. You can specify a field to be indexed for a specific node type only.

  4. You can easily identify the type of node and what is trying to represent (without implementing a custom solution). This allows you to perform normalization on the client side.

  5. It prevents you from putting bad data into your database.

  6. It would fit much better with GraphQL (meaning it would be easier to create a GraphQL layer on top of Dgraph) and would make the transition to Dgraph easier.

  7. Your database becomes self-documenting (through introspection).

GraphQL

type User {
	title: String  # E.g. Mr, Mrs, Ms - don't index!
	name: String!
	age: Int!
	posts: [Post]
}

type Post {
	title: String! @index  # E.g. A Great Post - index!
	user: User!
	comments: [Comments]
}

type Comments {
	text: String!
	user: User!
}

Dgraph

title: String 					# Ambiguous 
name: String 
age: Int
# posts: [Post] 				# No equivalent
# comments: [Comments]  		# No equivalent
# user: User! 					# No equivalent

New Approach
You would be able to set up specific types and have them attached to the root query.

mutation {
	type Users {
		name: String
	}
	schema {
		users: [Users]
	}
}

query {
	users(first: 10) {
		name
	}
}

If a user wanted to keep with the current Dgraph style, they could use Node. Node could be a default type, used without configuration but amended as required. Or, it could be a base type, so all other types would extend from it and it could be used as kind of Any):

mutation {
	type Node {
		title: String
		name: String 
		age: Int
	}
	type User extends Node {
		name: String @index
	}

	schema {
		nodes: [Node] # All nodes, regardless of type
		users: [User] # All nodes of type User
	}
}

query {
	customLookup1: nodes(id: $id) {
		title
	}
	nodes(first: 10) {
		title
	}
}

You could keep the current RDF type for set, which I really like (especially as you can do bulk transactions).

Not saying this is exactly the way you would want to do it, but wanted to provide some food for thought.

1 Like

There’s an easy way to use the type system. Have a type edge, with value as a string. We’ve used that for various things we’ve built with Dgraph. For e.g., if you need to specify that a node is a "User", you can add this edge, <uid> <type> "User" ..

Then you can look for func: eq(type, "User") to find / iterate over users.

The GraphQL like type system doesn’t have much value for us, because GraphQL’s type system is more in-line with a SQL database, where each type is a table, and hence different fields for different types correspond to different columns in those tables, and hence can be treated differently.

In Dgraph, each predicate is global. So, if you have a “title” edge, the data type for that would remain one, irrespective of which node it’s attached to.

1 Like

Just found this and hoped it might supplement the discussion of options… it would be nice if it worked with Apollo…

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.