Modify GraphQL Types implementation to support Storage Types

Hey @minions,

This thread came out of discussion between @kostub and me. He is working on adding types to storage so that we can store values in binary format along with their type information.

He suggested we can integrate both the type systems and I concur. Since both of them have overlapping functionalities and it will cost us unnecessary book-keeping if we keep two separate systems.

Current Graphql Type implementation is like this:

type Scalar struct {
	Name        string // name of scalar type
	Description string // short description
	ParseType   func(input []byte) (interface{}, error)  // parses []byte to appropriate type
}

In the above implementation, ParseType func interprets byte to contain a string (based on how we interpret/store strings in flatbuffer) and tries to coerce them to appropriate types.

Now, for storage types we need the following (the storage being done in binary format):

  • Parse to byte
  • Parse from byte
  • A byte representing the storage type

So, the proposal is to modify the scalar struct as follows:

type Scalar struct {
	Name        string                                  // name of scalar type
	Description string                                  // short description
	SType       byte                                    // storage type name
	FromBytes   func(input []byte) (interface{}, error) // parses []byte to correct type
	ToBytes     func(input []byte) ([]byte, error)      // converts to binary for storage
	FromString  func(input string) (interface{}, error) // for scalar coercion from flatbuffer
	ToString    func(input []byte) (string, error)      // useful when schema.type != storage.type
}
// func input/output values indicative, might change on actual implementation.

So, we will maintain separate types for schema and storage but in essence, underlying implementation will be the same.

@kostub Please do add on any information that I might have missed.

Thoughts, questions and suggestions are welcome on this.

Hmm… I don’t fully understand the implications of this. Can you give examples of how this would work? From RDF to type system to FB to query.

Let me have a chat with @kostub and try to have a data flow diagram here.

This is for storing the type information and representing the data in binary rather than string format as discussed in: Adding value type to posting list

It would be better to use the same type system for schema and storage, otherwise converting between the two types will be cumbersome and confusing. So I plan on reusing what @akhiltak has started with scalar types for storage as well.

The reason I need to extend what he has is as follows:

  1. Initial Conversion:
    RDF -> Posting List

The RDF data is in string, posting list is a byte array. The type can be specified either in RDF or in the schema. The conversion will be:

string -> FromString -> native type -> ToByteArray -> byte array

2 . The second conversion is at query time when reading the posting list data and converting to JSON. There are 2 situations here:
a. The storage type matches the schema type. In this case the conversion only involves converting from the byte array to the native type. i.e.

byte array -> FromByteArray -> native type

b. The second case is when the storage type does not match the schema type. In this case we have to do a cast from one type to the other. This process will be:

byte array -> FromByteArray(storage) -> native type -> ToString(storage) -> string -> FromString(schema) -> native type (schema)

Hope this clears it up.

1 Like

An example of the implementation can be found here:

How about Convert(byte array (/ Flatbuffers posting value), targetType)? In other words,

Byte array -> Native Type -> Target Type

My original suggestion was to avoid using strings as intermediaries for conversion. Though, I can see why that’d cause NC2 permutations if we support N types. Having said that, if you can think of a good way to achieve that, it’d be great, because conversion to string and then parsing it back is expensive.

Note that before we weren’t doing that before, because we would just return back the bytes that had been given to us; so this would have a negative impact on the performance.

We don’t have to implement all n^2. We can just implement the important ones like int to float. The rest can use the string intermediary. It seems pointless to try optimizing conversions like DateTime to IP address.
Kostub

1 Like

And for these weird ones, we can just return an error. That way, we can save some more unnecessary processing. Seems like this solution holds water.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.