Note: Feature requests are judged based on user experience and modeled on Go Experience Reports. These reports should focus on the problems: they should not focus on and need not propose solutions.
What you wanted to do
Store binary data.
Examples:
A pandas dataframe
A vector that represents a predication that was produced by an AI model
A file like a python module
What you actually did
Save a path to a binary file in an object store service.
Why that wasn’t great, with examples
This impacts on data management and the schema design.
To retrieve a data frame I need to do one API call to dgraph, getting the path of the file that stores it and then do another API call to the object-store.
This was thought of a few years ago, but it proved not necessarily useful and not so popular. If that ticket becomes popular, it is likely to be accepted.
In general, you can send RAW bytes in text to a predicate. I don’t know if it would be valid to have a scalar type just for that, to have a scalar type just for Bytes you would need to have some functionality that strings cannot do. And what would be those?
I guess it is possible to save a bitmap under string predicate.
It is less efficient though and limited to string size so huge vectors like those who are produced by AI models probably won’t work.
Another use case are th AI models themseves. In general you can have neural network that gets data and feaures and produce a model which is usually a binary file.
Saving that model to a grpah db can help find connections beetwen data features and versions of the model.
At the moment I don’t have any other use cases for saving binary data in the db but once I’ll have I will write an update to this post.
Can you show a reference from other DB working with this case?
So, the binary is a product of the AI process? Maybe you could have a custom data model for that. For those binaries, you could do a hash of the binary and save on Dgraph as a reference for your AI application. The node with the hash would have the path to the binary in some bucket so your application could download it (Or access it, if it is locally). The hash would be a method to check the integrity of the binary.
In this case, you are talking about the AI model, right? I don’t know much about it, but by the minimum I know I think that Graph relations fits nicely.
Probably who worked with Binary and Blobs in SQL would be the ones to give some light on this.
wrt AI models… I previously build such models for a living (I still enjoy building them as a hobby). You can just ravel() your tensors and store the flat array of [float] in your type. So you would have a schema that looks like this:
My experience had shown that this wasn’t great, so I ended up just using Badger directly.
Also there’s the tricky bit where you don’t store your floats in float64 but use float32 or float16. In those cases the scalar types of dgraph are not much use, and additional processing needs to be done.
I read your answers and understand what to do.
Guess that the binary scalar won’t be necessary after all.
Probably the best thing is to store binary files under an object store and save the url to their location.