[Devops / K8s / Docs] Production checkllist

Moved from GitHub dgraph/4379

Posted by hackintoshrao:

This requirement came up while writing the tutorial and blog on running Dgraph on K8s.

We need a document containing the production checklist with the suggestions to run Dgraph in production:

  • Recommended hardware for alphas and zeros. This helps us suggest the resources to be allocated for running an alpha or a zero in minimal mode and also for production workloads.

  • Recommended topology with recommendations for number of alphas and zeros to be run. Running 3 alphas vs 5, running 3 zeros vs one, why should one run odd number of alphas or zeros.

  • Recommended drives for VMs and K8’s.

  • Recommended on scheduling the database processes. This answers questions like should I have the run an alpha or a zero in dedicated nodes.

  • Optimizing for resiliency: Running more instances with not so great hardware resources vs Running fewer instances with higher resources.

  • Security best practices. TLS for clients, cluster-to-cluster secure communication.

  • Queries and mutations and best practices: For instance: Running a has() query is costly. What are the performance implications one needs to be aware of while running different queries and mutations?

  • configuring the database cache: How much should be set?

  • Recommended file system for production.

  • Monitoring and alterting.

  • When does the throughput saturate. This is to help decide when to scale up vertically with more hardware resources (more CPU’s, memory vs adding a new node).

  • Load balancing practices across various alphas, including using the readiness endpoint so that the request is not sent to clusters that are unhealthy.

  • Running the clusters across the zones.

  • Client best practices. Is there something you need to aware of using the clients? Like connection pooling?

  • Setting the open file descriptors limit to a reasonable baseline

hackintoshrao commented :

Hey @danielmai,

Did you publish the production checklist which you had recently prepared?
Are there any plans of adding them to the docs?

Sceat commented :

Very interested in this one, just started using Dgraph on Kubernetes with 1 zero and 1 alpha so far!

dmitryyankowski commented :

Personally i’m going to wait until @slotlocker2 comes out with a GKE Terraform config for Dgraph! (Terraform modules for Kubernetes - AWS EKS by slotlocker2 · Pull Request #5092 · dgraph-io/dgraph · GitHub) I’d like to have a production quality example to go off of, that includes even things like TLS. :slight_smile: