Moved from GitHub dgraph/4379
Posted by hackintoshrao:
This requirement came up while writing the tutorial and blog on running Dgraph on K8s.
We need a document containing the production checklist with the suggestions to run Dgraph in production:
-
Recommended hardware for alphas and zeros. This helps us suggest the resources to be allocated for running an alpha or a zero in minimal mode and also for production workloads.
-
Recommended topology with recommendations for number of alphas and zeros to be run. Running 3 alphas vs 5, running 3 zeros vs one, why should one run odd number of alphas or zeros.
-
Recommended drives for VMs and K8’s.
-
Recommended on scheduling the database processes. This answers questions like should I have the run an alpha or a zero in dedicated nodes.
-
Optimizing for resiliency: Running more instances with not so great hardware resources vs Running fewer instances with higher resources.
-
Security best practices. TLS for clients, cluster-to-cluster secure communication.
-
Queries and mutations and best practices: For instance: Running a
has()
query is costly. What are the performance implications one needs to be aware of while running different queries and mutations? -
configuring the database cache: How much should be set?
-
Recommended file system for production.
-
Monitoring and alterting.
-
When does the throughput saturate. This is to help decide when to scale up vertically with more hardware resources (more CPU’s, memory vs adding a new node).
-
Load balancing practices across various alphas, including using the readiness endpoint so that the request is not sent to clusters that are unhealthy.
-
Running the clusters across the zones.
-
Client best practices. Is there something you need to aware of using the clients? Like connection pooling?
-
Setting the open file descriptors limit to a reasonable baseline