At present, Zero has two kinds of open ports:
- HTTP (default 6080)
- gPRC (default 5080)
Over HTTP, zero serves the following endpoints:
|Endpoint||Read/Write (R/W)||Is security-critical? (Y/N)||Currently available at Alpha /admin||Info|
|1.||/health||R||N||Y (WIth actual health status)||Just acts as a ping response provider. No information is emitted from here. Used by Kubernetes liveness probes.|
|2.||/state||R||Y||Y||Exposes Membership information for all the Zeros and Alpha Groups. Also, tells which predicates are being served by which group.|
|3.||/removeNode||W||Y||N||Used to remove a node from an Alpha group. Emits a success message or error.|
|4.||/moveTablet||W||Y||N||Used to move a tablet from one group to another. Emits a success message or error|
|5.||/assign||W||Y||N||Used to lease UIDs and timestamps. Responds with the start and end IDs of the lease.|
|6.||/enterpriseLicense||W||Y||N||Used to apply enterprise license. Emits a success message or error.|
/health every other endpoint either changes something in the system or emits some information which may be critical from the security point of view. Only the system administrator is supposed to have access to these endpoints. So ideally, Zero’s HTTP port should not be exposed to the public domain. Still, from a security perspective, an open HTTP port can turn out to be a security risk. To overcome this problem, we are thinking of the following approaches:
Completely move security-critical HTTP endpoints from Zero to Alpha’s GraphQL Admin endpoint (
/admin). There they would be served in GraphQL over HTTP with proper access control mechanisms in place, as the
/adminendpoint takes care of applying checks like IP Whitelisting, Poorman’s Auth and enterprise ACL. They would no longer be served by Zero over the HTTP port, instead would now be served over Zero’s internal gRPC port. Alpha’s GraphQL admin would now act as a proxy to Zero’s gRPC calls. Zero’s internal gRPC port is not intended to be exposed to the public domain and should have mTLS in place.
- All the access control mechanisms will be in place including ACL.
- All the admin operations will be at one place.
- Moving them to Alpha would imply removing them from Zero’s HTTP server, which would be a breaking change if we are to make this change as part of a patch release for v20.07.
Introduce one flag in Zero to enable/disable all security-critical HTTP endpoints at once.
- Not a breaking change.
- Gives users an option to disable these HTTP endpoints, in-case they can’t control port-level access in their environment.
- If disabled, one would have to restart Zero to enable these HTTP endpoints to make any change to the configuration. That will cause downtime.
- If one doesn’t want to restart, then we will have to support the functionality of these HTTP endpoints over Zero’s internal gRPC. At present, not all the functionality provided by these HTTP endpoints is available over Zero’s internal gRPC. gRPC would be inconvenient to use from an operations perspective.
- Some users also want ACL checks to be enabled for these endpoints, which doesn’t seem possible in this approach.
Allow authenticating access to HTTP port via Mutual TLS.
- Not a breaking change, an enhancement instead.
- Provides a trusted layer of authentication.
- Users would want to add/remove clients to be trusted for mTLS, we would need to add and extra endpoint for that. That endpoint would have to have a single trusted root-like client. Becomes a single-point-of-failure, in cases like client private-key getting leaked. Also, Zero would have to be taken down if the root client’s public verification key is to be changed.
A combination of approaches 1 and 2. Not remove the security-critical HTTP endpoints from Zero, but do introduce a flag in Zero to enable/disable them. Also, have them served by Alpha’s GraphQL admin.
- Not a breaking change.
- Users get a choice to expose these endpoints via Alpha with enterprise ACL or via Zero, or both.
- Port hardening with mTLS is still required for the Zero gRPC port.
Looking for any suggestions and ideas regarding this.
UPDATE (5 Oct 2020)
We are proceeding with the 4th approach at present. We still need to decide whether to fully close Zero’s HTTP port or just close the security-critical HTTP endpoints and not the HTTP port itself.
PR: feat(GraphQL): Zero HTTP endpoints are now available at GraphQL admin (GRAPHQL-1118) by abhimanyusinghgaur · Pull Request #6649 · dgraph-io/dgraph · GitHub
The following points should be noted if we are to close the HTTP port:
/healthwould no longer work if the port is disabled by the flag. Meaning, if Alphas are down, then there’s no way to know Zero health. Kubernetes liveness probes rely on
/healthfor per-instance health-checking. We will need to find another way to do that.
- Any debugging or profiling data collection on Zero requires HTTP port to be open, which won’t work if the flag disables them.
On the other hand, if we don’t close the port, but choose to close the security-critical HTTP endpoints, the following should be noted:
- Open HTTP port will always be reported as a security risk/misconfiguration by various security tools.
- Having the
/debug/pprofendpoints accessible on production environments is a security risk. See: Acunetix vulnerablities index, Your pprof is showing
UPDATE (12 Oct 2020)
We are not going to pursue this at present.
UPDATE (31 March 2021)
This change is required now for the Dgraph Cloud architecture. So, we have gone ahead with the 4th approach and merged this change to master. This would be available in the v21.03 release.