Develop Lambdas with any programming language

Edit: After some discussions it became clear that the original idea to implement lambdas in other languages was not ideal. The following describes the new proposal:

Enhancement on Lambdas

  • Use WASM to create Lambda Scripts. (Using wasmer-go )
  • Allow one? Lambda Script per tenant.
  • Support multiple lambda servers (internal + external)
    @lambda → default lambda on alpha
    @lambda(url: xxx) → run on external lambda
    Reference: Feature request: multiple lambda servers
  • Optionally load WASM script from external server (since WASM can become quiet large)

If WASM provides a good lambda experience, consider expanding WASM into:

  • tokenizer implementations
  • functions/graph algorithms
  • community sharing of said code artifacts

I have now started working on WASM support and testing for Go, Rust and AssemblyScript.

Old Proposal

What I want to do

I would like more fine-grained control of my lambda functionality and program it in other languages than JavaScript.

What I did

I added a pull request that would allow developers to deploy lambda servers within docker containers on alpha. This means developers are able to use any programming language that supports webserver development to use as lambda server.

Example usage:

dgraph alpha --my=alpha:7080 --zero=zero:5080 --security "whitelist=0.0.0.0/0" --lambda "docker-image=ghcr.io/schartey/dgraph-lambda-example:main; docker-registry=https://ghcr.io/;"

So I added docker parameters and the alpha automatically pulls the image and starts a container.

How does it work:

  • Rebased the Dockerfile on a dind ubuntu image, so we can start docker images on alpha.
  • Added multiple flags to lambda: docker-image, docker-user, docker-password, docker-registry
  • If docker-image flag is set, alpha tries to download docker image and runs image as container instead of node

What I need:

Opinions and other feedback is welcome. Are there any issues with this idea? If feedback is positive I will continue development.

What’s missing:

  • admin api update
    Allow update of docker image through admin api
  • Refactoring
  • Tests

I also provided an example project of what such a lambda server could look like in Go:

Use Cases:

  • Develop in accustomed programming language
  • Use any library without workarounds or webpack
  • Run custom graph algorithms next to the db
2 Likes

I have used Schartey/dgraph-lambda-go in a side project and it was so much nicer to work with than JavaScript. However, the most recent dgraph release makes it so that dgraph alphas spin up many instances of nodejs running the lambda code stored as blobs in Dgraph. This makes it… harder to use other lambda implementations. (I believe still possible by setting a remote lambda url and setting the number of lambda nodejs servers being exec’d to 0 and maybe ignoring the script being sent at runtime?)

Anyway, I would love some ability for dgraph to accept any language and still be native and fast - maybe some sort of optimized WASM situation? Red Panda (Kafka clone) has it’s built-in ETL compiled to WASM according to the docs.

Thanks for the feedback!
First of all thank you for trying out dgraph-lambda-go, it means a lot to me to receive your opinion, especially if it’s positive.

I’m aware of the changes to dgraph, that alpha spins up lambdas and integrated this approach into the workflow. In the current state, alpha would spin up the number of docker containers with your server running just like it would with the JS instance. That is why I had to make some changes to the Dockerfile and add dind (Docker in Docker).
So you do not need the remote lambda for the docker container to run, it runs directly on the alpha instance.

On your note about WASM. I tried to implement this and I got it working within Node, but I had issues compiling the WASM correctly or rather supporting wasm files from different compilers. I was able to support Rust and AssemblyScript. Go is super focused on browser support and doesn’t work well with NodeJS. I didn’t test other languages and wasn’t too sure if wasm is mature enough. There is not a lot of documentation about WASM outside from browser usage.
But maybe my approach was wrong, I never used WASM before. Since RedPanda is open source I’ll take a look at how they did it.

1 Like

Just retried WASM with wasmer-go and it just worked out of the box… :upside_down_face:
Much to the contrary in NodeJs…

So I will leave this up, because I still think using docker images has its charme, but I’m going to continue testing with WASM as well. @iluminae thanks for the indirect enlightenment :rofl:

Cool! Hey it’s nice to see what’s out there.

Edit: in response to the docker PR you contributed, I will say that from a industry perspective, having the docker daemon and docker API to manage containers is yesterday’s idea of container management. See kubernetes sunset of docker support and docker’s recent enterprise decisions.

Personally I run dgraph in kubernetes and having external and separately scalable and manageable service to handle lambda execution seemed idiomatic where dgraph exec’ing nodejs N times does not. (But somehow wasmer does feel better)

My main use of dgraph is not GraphQL at all, so I have very little skin in this game, but ideally I want the experience of loading JS in and have it execute,
(including with tenant support) - and I want to do it in any language I want, completely sandboxed.

1 Like

Hello, I am also using my own lambda server in go (written long ago). For the next project I would love to try Schartey/dgraph-lambda-example as well, as it is really nice, especially parsing existing schema file (I would like some small changes related to snake case / camel case function naming, and splitting generated functions to many files instead a big one - especially if you have many lambda queries/mutations).

The problem with the proposal I see is that while it is OK for dgraph to be embedded in the docker image (like official dgraph/dgraph), then dgraph should not embed any knowledge of docker, as it creates circular dependencies. It should be perfectly legal to run dgraph with lambda without any docker infrastructure at all.

I agree that deploying dgraph and lambda servers should be orchestrated by external entity (chosen by the user), be it kubernetes, docker/swarm etc, and dgraph should not try to deploy lambda server (unless it would be lazy deployed on demand and shut down when no longer used?) - i like my lambda server in other docker container/image than dgraph alpha. I suspect that the decision to make dgraph alpha deploy a lambda server was made to make the deployment process easy (lambda server scales with dgraph alpha) and to “guarantee” that the lambda server is accessible locally (localhost instead of network access) so the latency is minimal. Even then I would like the option to have lambda binary and flags configurable (so then you could either deploy your local lambda binary or run a script deploying your docker image).
I wish that the dgraph alpha developed a smart protocol for checking/configuring available lambda servers (I would like to have many lambda servers - as mentioned in Feature request: multiple lambda servers, and also use other script languages, e.g. lua instead of javascript) so no nodejs or javascript would be hardcoded.

Yea the more I think about it, I would like it to be WASM and stored in dgraph, executed by dgraph ala wasmer maybe with a configurable semaphore for protection.

No orchestration overhead when not used, can scale with the dgraph server. Can run JavaScript, Go, C, Rust, etc. Dgraph already uses jemalloc so no big deal using cgo to talk to wasmer.

Thanks guys for your insight.
I had similar thoughts to yours and now I see that maybe I jumped the gun too fast and should have put more thought into this.
I wanted to build upon existing workflows and provide more flexibility, but with the current setup it’s not too easy.

The problem with the proposal I see is that while it is OK for dgraph to be embedded in the docker image (like official dgraph/dgraph), then dgraph should not embed any knowledge of docker, as it creates circular dependencies. It should be perfectly legal to run dgraph with lambda without any docker infrastructure at all

The way I made it the dependency is not really circular. The docker within alpha would be completely independent of the outside world. Yet I came to the conclusion that the whole idea of deploying lambdas directly on alpha is a bit weird and managing containers is the task of a external orchestration unit and not of the alpha server.

Why do I think it’s weird? This leads to alpha and lambda fighting for resources. If you do work heavy tasks on lambda you would have to scale the whole alpha just to deploy more resources for lambda.

Personally I run dgraph in kubernetes and having external and separately scalable and manageable service to handle lambda execution seemed idiomatic where dgraph exec’ing nodejs N times does not. (But somehow wasmer does feel better)

I think this sums it up.

So in the end there are only two choices:

  • Run Lambda Scripts directly on Alpha and keep tasks lightweight, ideally with WASM.
  • Run external lambda server and deploy it as close to alphas as possible using external orchestration like Kubernetes.

Ideally we could do both and decide which lambda servers take care of which tasks.
So I already started and will continue work on a WASM integration.

Guys I want to thank you again for your thoughts, it really helps me a lot! I learn a lot through these discussions and it’s easier to see actual needs.

1 Like

One more thing: currently I run my CI pipeline so I can upgrade lambda server without bringing down the dgraph server. For logic stored as a script it would be possible to upload a new script to dgraph / lambda server (I am not sure if official lambda server works like that). So while designing your WASM solution please allow upgrading the lambda code without bringing dgraph server down.

Yea obviously still need to support storing and executing a different lambda per tenant, which leads you to say ‘store it in Dgraph’ but I wonder if that will reach it’s limit with giant WASM binaries stored in badger? Maybe need to support loading WASM via a object storage URL or stored locally - just a off-hand thought, not sure if it is a real issue or not.

Just to be clear: With the new update, doesn’t it work anymore to have your own lambda service and tell dgraph where it can find that service?

It is still possible. But if you don’t specify an external lambda service, alpha will startup lambda services next to alpha on its own.

@miko with the new release you can update the script using the admin api if you don’t use external lambdas.

So what can I do by allowing WASM/docker containers for the dgraph-internal lambda service which I can’t do with an external lambda service already?

Super low latency, since the lambda server runs directly next to alpha.
Tbh I never thought if that is such a concern for devs though. Is it? I guess for cloud users it’s better to have it right there. Devs that use kubernetes could probably orchestrate everthing so alpha and lambdas are close to each other.

Edit: For the team this also probably simplifies deployment as they have one less deployment to take care of.

I really like this idea. If there are things that can easily be done in DQL then I use custom mutation. There are lambdas I use for workarounds to things that should be in DQL such as string concatenation and those should live really close to the alpha as they are used a lot. And then there are other lesser used lambdas to do tasks that are occasional. These are a tenth of a percent used as the above lambdas. So these can live outside of alpha. A bit more admin management for advanced users but create a better more efficient system.

I am thinking if loading in WASM and executing it like-native works well… maybe that could be the future of custom tokenizers, custom dql functions, custom data types… What a cool draw that would be as an extensibility feature on this system.

Custom tokenizers right now are doable but not fun since you have to build a go plugin shared object, which has a big restriction of having to have 100% the same imports/versions to be able to be loaded into dgraph :/. Something sandboxed may be better.

2 Likes

Yes!! And then a package manager of sorts that is open source for us to share these types of scripts between users too. One person creates a concat function and then any other user could import that plugin/package (whatever it is called)

2 Likes

I wanted to summarize the discussion real quick to make a basic feature request out of it.

Enhancement on Lambdas

  • Use WASM to create Lambda Scripts. (Using wasmer-go)
  • Allow one? Lambda Script per tenant.
  • Support multiple lambda servers (internal + external)
    @lambda → default lambda on alpha
    @lambda(url: xxx) → run on external lambda
    Reference: Feature request: multiple lambda servers
  • Optionally load WASM script from external server (since WASM can become quiet large)

Nice to have

  • Allow implementation of custom tokenizers within lambda
  • Lambda Script Package Manager

Tell me if I missed something!

I would reword these - really it was a prognostication of expansion of this WASM environment into a second generation of dgraph plugins. Unrelated to lambda, being able to script pieces of dgraph would make it incredibly valuable.

Try:

If WASM provides a good lambda experience, consider expanding WASM into:
* tokenizer implementations
* functions/graph algorithms
* community sharing of said code artifacts
2 Likes

Nice to have: reload lambda logic without restarting dgraph alpha