Allow metrics endpoint to run insecure when using mTLS on Alpha

diggy · March 10, 2020, 6:23pm

Moved from GitHub dgraph/4910

Posted by fl-max:

Experience Report

Note: Feature requests are judged based on user experience and modeled on Go Experience Reports. These reports should focus on the problems: they should not focus on and need not propose solutions.

What you wanted to do

Be able to disable mTLS on health check and metrics endpoints when mTLS is used on Alpha nodes.

What you actually did

My deployment is on Kubernetes. For livenessProbe when using mTLS I’m forced to use exec.command instead of the standard http/https probe type:

             command:
                - /bin/sh
                - '-c'
                - "curl https://$MY_POD_NAME:8080/health?live=1 --http1.1 --cacert {{ .Values.alpha.config.tls_dir }}/ca.crt --cert {{ .Values.alpha.config.tls_dir }}/client.dgraphadmin.crt --key {{ .Values.alpha.config.tls_dir }}/client.dgraphadmin.key"

For prometheus metrics, /debug/prometheus_metrics, I had to first create a dgraph-tls secret with the needed Certs/Key and then tell the ServiceMonitor to auth with it:

      {{- if .Values.alpha.config.tls_dir }}
      scheme: https
      tlsConfig:
        insecureSkipVerify: true
        ca:
          secret:
            name: dgraph-tls
            key: "ca.crt"
        cert:
          secret:
            name: dgraph-tls
            key: "client.dgraphadmin.crt"
        keySecret:
          name: dgraph-tls
          key: "client.dgraphadmin.key"
      {{- end }}

Why that wasn’t great, with examples

I don’t think the health check and prometheus metrics endpoints need to be secured with mTLS and it adds a lot of overhead Kubernetes to make it all play nice. Zero is “open”; it would make sense to make Alpha the same. At the very least, make it a configurable option.

Any external references to support your case

diggy · April 20, 2020, 9:09pm

manishrjain commented :

Is that a common practice? Do you have examples from other DBs to back up the proposal?

diggy · April 20, 2020, 9:15pm

fl-max commented :

https://github.com/helm/charts/blob/master/stable/prometheus/templates/server-deployment.yaml#L111

diggy · April 21, 2020, 11:25pm

darkn3rd commented :

For the livenessProbe, could you just add scheme field set to HTTPS, so that the kubelet sends HTTPS request skipping the certificate verification?

diggy · May 12, 2020, 4:57pm

fl-max commented :

Also, Azure Load Balancers do not support health probe types other than TCP (See #76657 so you cannot front Alpha (w/mTLS) without getting spammed with http: TLS handshake error from <ip>:<port>: EOF messages in the logs.

diggy · June 10, 2020, 12:18am

danielmai commented :

For the health endpoint, Dgraph can’t have only health and metrics be served over HTTP while the rest of the endpoints are served over HTTPS with mTLS. That would require a separate port to be exposed for insecure traffic only. That would be incompatible with the existing ports setup.

For health checks, we can add to the Dgraph docs and K8s configs the way to set up health checks for liveness and readiness with exec command probes for client authentication via curl.

For metrics, Prometheus has tls_config options to collect metrics from an mTLS source.

diggy · June 10, 2020, 2:09pm

fl-max commented :

@danielmai Yea, we have Prometheus metrics configured with mTLS. It doesn’t break any “standard pattern” and works just fine. I would also add that Prometheus Metrics could be used in a malicious way so should be configured with mTLS along with the primary endpoint.

As for the Health Endpoint, using exec/curl, it introduces the following issues:

Breaks integration with Cloud LBs
Exec generally considered a last resort for HTTP based apps as it puts a higher load on the container (albeit small) and doesn’t bubble up the HTTP error code.
Exec has a propensity to create Zombie processes (See 10 Ways to Shoot Yourself in the Foot with Kubernetes [23:28] )
Creates a hard dependency on the curl package
Curl opens up other security issues as it allows an attacker to pull down arbitrary code/data if they gained access to the container

Ideally Dgraph would run as a non-root user, blocking anyone from being able to install packages (ie. curl), and only contain packages that are absolutely necessary to run Dgraph (Alpha).

artooro · September 4, 2020, 1:16pm

I was using a tcp check because of this issue, but the TLS handshake error in the logs isn’t great.

So just switched to the curl option. I think if you tweak the timeout to 10 seconds and have it run every 20 instead of 10 seconds, it should be a reasonable overhead, and the risk of zombie processes pretty low.
I guess we’ll find out.

aman-bansal · November 10, 2020, 11:28am

We have made /health available without tls because orchestrators like kubernetes doesn’t support mtls. It’s merged in master and it will be available with 20.11 release

Topic		Replies	Views
Enabling mTLS breaks Readiness/Liveness probes Issues untagged , charts	3	840	September 17, 2020
Dgraph TLS with the Dgraph helm charts Dgraph	4	1079	February 20, 2021
Ratel mTLS connection through ingress-nginx to Alpha Ratel kind:question	1	887	September 30, 2020
Dgraph live with tls error Dgraph dgraph	2	219	January 9, 2024
Better practice... Google TLS ingress to non-encrypted backend vs. TLS to Dgraph Users discussion	3	845	October 2, 2020

Allow metrics endpoint to run insecure when using mTLS on Alpha

Experience Report

What you wanted to do

What you actually did

Why that wasn’t great, with examples

Any external references to support your case

Related topics