A recommended, simple, secure, upgradable community based approach to production
TLDR: I am really struggling getting a production quality DGraph instance setup as an App Developer. It would be great if the DGraph team could provided a simple (or as simple as possible) paint by numbers style scripted setup for running DGraph in production. One which includes security considerations, persistent data store setup, backups, logging, and provide upgrade scripts as DGraph is updated. I see this as some type of intermediate step between now and when you have a 3rd party offering DGraph as a service (my preference).
Failing that just some more help on the best direction to go would be great.
Background
I am working on a startup/sabbatical called Helperific. The /sabbatical part is because I have some learning goals, like building it as a truly scalable system, and exploring unfamiliar new to me tech like Graph Databases, Lambda etc.
DGraph as it ticks all my boxes when researching what DB: Scalable, Distributed, Graph Databases (New to me), Open Source (Apache). All awesome!
Getting going with docker compose, which I was unfamiliar with at the time was easy!
My Ideal Setup
For my app I ideally want a High Availability, Multi Region setup, this feels like something unique to Dgraph that would be hard for most other databases. My app is small so 6 small t2.micro servers in 3 regions, + 3x Zero I guess, different availability zones (2 zones in each region) and 3 different Lambda deployments pointing to them would be cool.
My Struggles
So I have been doing lots of courses to try to get my head around Docker, AWS Networking, Kubernetes and deployment as code solutions such as Terraform.
I think I want to run Dgraph in a private VPC to secure it, as soon as you do this though, you then add complexity as you need a jump box to use Docker Machine for a single instance setup. So then I question if this is be best option.
I was also thinking maybe just setting up docker images preconfigured and using AWS ECS to deploy them to known IP Addresses to get my 6 image cluster. For upgrades I would create the 6 new images deploy those then move the data across, but I am just not sure this feel right.
Then I figured I should look more into Kubernetes and I setup a Kubernetes cluster using KOPs, by default it sets itself up in a public subnet, in just one region and I am not sure if I should be setting it up in a private subnet or the public subnet is fine, as it manages it’s own network security. Also the internet seems to indicate Kubernetes is not a great fit for DB’s but that might not apply to distributed dgraph.
As you can see there are so many deployment options, Single Host, Multi Host, Docker, Docker Swarm, Kubernetes. Within these there is HA and Replicated Cluster. I am struggling to pick one.
Also I have not even started trying to backup and upgrade to a newer version so far I have just been blowing away the data.
My Real Need
For now I just need to be able to get some type of dev, prod box up that I can easily update as new versions of dgraph are released, and eventually transition to a hosted solution one that has some backup script running and some type of monitoring, it probably does not even need to be distributed, even though that would be much cooler!
My suggested solutions
While I am sure all the deployment options are valuable and the best option depends on your setup. I feel that if you could provide a “best for most people” Wirecutter style recommendation, and step through this suggested setup in granular detail that would be very helpful, between now and when you have a hosted version.
Steps like configure you VPC, Create Custom Docker Images, Allocate IP Addresses, Create Bastion Box, Add this recommended monitoring solution. Add this script to backup to S3 periodically, Run this script to upgrade. Paint by numbers.
I am happy to put some time in learning a recommended stack, Docker, Docker Swarm, Kubernetes, etc, I would even consider moving providers from AWS to Google Cloud or whatever service if this helps, as the rest of my app is easy to setup in any provider.
I am thinking maybe you could create a Udemy course based on these steps, if you choose Kubernetes, Docker, Terraform as the preferred platform you could use an existing course as a prerequisite, so you don’t have to cover Docker / Kubernetes / Terraform basics, just the Setup and management of Dgraph. Either you could do the course yourself, or maybe you could find an existing author to work with on creating one, if you don’t have time.
What do you think?
I would be happy to help with this in any way I can!