This thread came out of discussion between @kostub and me. He is working on adding types to storage so that we can store values in binary format along with their type information.
He suggested we can integrate both the type systems and I concur. Since both of them have overlapping functionalities and it will cost us unnecessary book-keeping if we keep two separate systems.
Current Graphql Type implementation is like this:
type Scalar struct {
Name string // name of scalar type
Description string // short description
ParseType func(input []byte) (interface{}, error) // parses []byte to appropriate type
}
In the above implementation, ParseType func interprets byte to contain a string (based on how we interpret/store strings in flatbuffer) and tries to coerce them to appropriate types.
Now, for storage types we need the following (the storage being done in binary format):
Parse to byte
Parse from byte
A byte representing the storage type
So, the proposal is to modify the scalar struct as follows:
type Scalar struct {
Name string // name of scalar type
Description string // short description
SType byte // storage type name
FromBytes func(input []byte) (interface{}, error) // parses []byte to correct type
ToBytes func(input []byte) ([]byte, error) // converts to binary for storage
FromString func(input string) (interface{}, error) // for scalar coercion from flatbuffer
ToString func(input []byte) (string, error) // useful when schema.type != storage.type
}
// func input/output values indicative, might change on actual implementation.
So, we will maintain separate types for schema and storage but in essence, underlying implementation will be the same.
@kostub Please do add on any information that I might have missed.
Thoughts, questions and suggestions are welcome on this.
It would be better to use the same type system for schema and storage, otherwise converting between the two types will be cumbersome and confusing. So I plan on reusing what @akhiltak has started with scalar types for storage as well.
The reason I need to extend what he has is as follows:
Initial Conversion:
RDF → Posting List
The RDF data is in string, posting list is a byte array. The type can be specified either in RDF or in the schema. The conversion will be:
2 . The second conversion is at query time when reading the posting list data and converting to JSON. There are 2 situations here:
a. The storage type matches the schema type. In this case the conversion only involves converting from the byte array to the native type. i.e.
byte array -> FromByteArray -> native type
b. The second case is when the storage type does not match the schema type. In this case we have to do a cast from one type to the other. This process will be:
byte array -> FromByteArray(storage) -> native type -> ToString(storage) -> string -> FromString(schema) -> native type (schema)
How about Convert(byte array (/ Flatbuffers posting value), targetType)? In other words,
Byte array -> Native Type -> Target Type
My original suggestion was to avoid using strings as intermediaries for conversion. Though, I can see why that’d cause NC2 permutations if we support N types. Having said that, if you can think of a good way to achieve that, it’d be great, because conversion to string and then parsing it back is expensive.
Note that before we weren’t doing that before, because we would just return back the bytes that had been given to us; so this would have a negative impact on the performance.
We don’t have to implement all n^2. We can just implement the important ones like int to float. The rest can use the string intermediary. It seems pointless to try optimizing conversions like DateTime to IP address.
Kostub