Edit: After some discussions it became clear that the original idea to implement lambdas in other languages was not ideal. The following describes the new proposal:
Enhancement on Lambdas
Use WASM to create Lambda Scripts. (Using wasmer-go )
Allow one? Lambda Script per tenant.
Support multiple lambda servers (internal + external) @lambda → default lambda on alpha @lambda(url: xxx) → run on external lambda
Reference: Feature request: multiple lambda servers
Optionally load WASM script from external server (since WASM can become quiet large)
If WASM provides a good lambda experience, consider expanding WASM into:
tokenizer implementations
functions/graph algorithms
community sharing of said code artifacts
I have now started working on WASM support and testing for Go, Rust and AssemblyScript.
Old Proposal
What I want to do
I would like more fine-grained control of my lambda functionality and program it in other languages than JavaScript.
What I did
I added a pull request that would allow developers to deploy lambda servers within docker containers on alpha. This means developers are able to use any programming language that supports webserver development to use as lambda server.
I have used Schartey/dgraph-lambda-go in a side project and it was so much nicer to work with than JavaScript. However, the most recent dgraph release makes it so that dgraph alphas spin up many instances of nodejs running the lambda code stored as blobs in Dgraph. This makes it… harder to use other lambda implementations. (I believe still possible by setting a remote lambda url and setting the number of lambda nodejs servers being exec’d to 0 and maybe ignoring the script being sent at runtime?)
Anyway, I would love some ability for dgraph to accept any language and still be native and fast - maybe some sort of optimized WASM situation? Red Panda (Kafka clone) has it’s built-in ETL compiled to WASM according to the docs.
Thanks for the feedback!
First of all thank you for trying out dgraph-lambda-go, it means a lot to me to receive your opinion, especially if it’s positive.
I’m aware of the changes to dgraph, that alpha spins up lambdas and integrated this approach into the workflow. In the current state, alpha would spin up the number of docker containers with your server running just like it would with the JS instance. That is why I had to make some changes to the Dockerfile and add dind (Docker in Docker).
So you do not need the remote lambda for the docker container to run, it runs directly on the alpha instance.
On your note about WASM. I tried to implement this and I got it working within Node, but I had issues compiling the WASM correctly or rather supporting wasm files from different compilers. I was able to support Rust and AssemblyScript. Go is super focused on browser support and doesn’t work well with NodeJS. I didn’t test other languages and wasn’t too sure if wasm is mature enough. There is not a lot of documentation about WASM outside from browser usage.
But maybe my approach was wrong, I never used WASM before. Since RedPanda is open source I’ll take a look at how they did it.
Just retried WASM with wasmer-go and it just worked out of the box…
Much to the contrary in NodeJs…
So I will leave this up, because I still think using docker images has its charme, but I’m going to continue testing with WASM as well. @iluminae thanks for the indirect enlightenment
Edit: in response to the docker PR you contributed, I will say that from a industry perspective, having the docker daemon and docker API to manage containers is yesterday’s idea of container management. See kubernetes sunset of docker support and docker’s recent enterprise decisions.
Personally I run dgraph in kubernetes and having external and separately scalable and manageable service to handle lambda execution seemed idiomatic where dgraph exec’ing nodejs N times does not. (But somehow wasmer does feel better)
My main use of dgraph is not GraphQL at all, so I have very little skin in this game, but ideally I want the experience of loading JS in and have it execute,
(including with tenant support) - and I want to do it in any language I want, completely sandboxed.
Hello, I am also using my own lambda server in go (written long ago). For the next project I would love to try Schartey/dgraph-lambda-example as well, as it is really nice, especially parsing existing schema file (I would like some small changes related to snake case / camel case function naming, and splitting generated functions to many files instead a big one - especially if you have many lambda queries/mutations).
The problem with the proposal I see is that while it is OK for dgraph to be embedded in the docker image (like official dgraph/dgraph), then dgraph should not embed any knowledge of docker, as it creates circular dependencies. It should be perfectly legal to run dgraph with lambda without any docker infrastructure at all.
I agree that deploying dgraph and lambda servers should be orchestrated by external entity (chosen by the user), be it kubernetes, docker/swarm etc, and dgraph should not try to deploy lambda server (unless it would be lazy deployed on demand and shut down when no longer used?) - i like my lambda server in other docker container/image than dgraph alpha. I suspect that the decision to make dgraph alpha deploy a lambda server was made to make the deployment process easy (lambda server scales with dgraph alpha) and to “guarantee” that the lambda server is accessible locally (localhost instead of network access) so the latency is minimal. Even then I would like the option to have lambda binary and flags configurable (so then you could either deploy your local lambda binary or run a script deploying your docker image).
I wish that the dgraph alpha developed a smart protocol for checking/configuring available lambda servers (I would like to have many lambda servers - as mentioned in Feature request: multiple lambda servers, and also use other script languages, e.g. lua instead of javascript) so no nodejs or javascript would be hardcoded.
Yea the more I think about it, I would like it to be WASM and stored in dgraph, executed by dgraph ala wasmer maybe with a configurable semaphore for protection.
No orchestration overhead when not used, can scale with the dgraph server. Can run JavaScript, Go, C, Rust, etc. Dgraph already uses jemalloc so no big deal using cgo to talk to wasmer.
Thanks guys for your insight.
I had similar thoughts to yours and now I see that maybe I jumped the gun too fast and should have put more thought into this.
I wanted to build upon existing workflows and provide more flexibility, but with the current setup it’s not too easy.
The problem with the proposal I see is that while it is OK for dgraph to be embedded in the docker image (like official dgraph/dgraph), then dgraph should not embed any knowledge of docker, as it creates circular dependencies. It should be perfectly legal to run dgraph with lambda without any docker infrastructure at all
The way I made it the dependency is not really circular. The docker within alpha would be completely independent of the outside world. Yet I came to the conclusion that the whole idea of deploying lambdas directly on alpha is a bit weird and managing containers is the task of a external orchestration unit and not of the alpha server.
Why do I think it’s weird? This leads to alpha and lambda fighting for resources. If you do work heavy tasks on lambda you would have to scale the whole alpha just to deploy more resources for lambda.
Personally I run dgraph in kubernetes and having external and separately scalable and manageable service to handle lambda execution seemed idiomatic where dgraph exec’ing nodejs N times does not. (But somehow wasmer does feel better)
I think this sums it up.
So in the end there are only two choices:
Run Lambda Scripts directly on Alpha and keep tasks lightweight, ideally with WASM.
Run external lambda server and deploy it as close to alphas as possible using external orchestration like Kubernetes.
Ideally we could do both and decide which lambda servers take care of which tasks.
So I already started and will continue work on a WASM integration.
Guys I want to thank you again for your thoughts, it really helps me a lot! I learn a lot through these discussions and it’s easier to see actual needs.
One more thing: currently I run my CI pipeline so I can upgrade lambda server without bringing down the dgraph server. For logic stored as a script it would be possible to upload a new script to dgraph / lambda server (I am not sure if official lambda server works like that). So while designing your WASM solution please allow upgrading the lambda code without bringing dgraph server down.
Yea obviously still need to support storing and executing a different lambda per tenant, which leads you to say ‘store it in Dgraph’ but I wonder if that will reach it’s limit with giant WASM binaries stored in badger? Maybe need to support loading WASM via a object storage URL or stored locally - just a off-hand thought, not sure if it is a real issue or not.
Super low latency, since the lambda server runs directly next to alpha.
Tbh I never thought if that is such a concern for devs though. Is it? I guess for cloud users it’s better to have it right there. Devs that use kubernetes could probably orchestrate everthing so alpha and lambdas are close to each other.
Edit: For the team this also probably simplifies deployment as they have one less deployment to take care of.
I really like this idea. If there are things that can easily be done in DQL then I use custom mutation. There are lambdas I use for workarounds to things that should be in DQL such as string concatenation and those should live really close to the alpha as they are used a lot. And then there are other lesser used lambdas to do tasks that are occasional. These are a tenth of a percent used as the above lambdas. So these can live outside of alpha. A bit more admin management for advanced users but create a better more efficient system.
I am thinking if loading in WASM and executing it like-native works well… maybe that could be the future of custom tokenizers, custom dql functions, custom data types… What a cool draw that would be as an extensibility feature on this system.
Custom tokenizers right now are doable but not fun since you have to build a go plugin shared object, which has a big restriction of having to have 100% the same imports/versions to be able to be loaded into dgraph :/. Something sandboxed may be better.
Yes!! And then a package manager of sorts that is open source for us to share these types of scripts between users too. One person creates a concat function and then any other user could import that plugin/package (whatever it is called)
I would reword these - really it was a prognostication of expansion of this WASM environment into a second generation of dgraph plugins. Unrelated to lambda, being able to script pieces of dgraph would make it incredibly valuable.
Try:
If WASM provides a good lambda experience, consider expanding WASM into:
* tokenizer implementations
* functions/graph algorithms
* community sharing of said code artifacts