Kubernetes and OpenShift have become the standard for modern cloud-based application development. As infrastructure grows and clusters scale, huge volumes of TLS and mTLS certificates are used as development teams build and deploy workloads on accelerated release cycles.
Workload security is, of course, essential to allow infrastructure to scale; however, more companies using Kubernetes in production are seeing more frequent security vulnerabilities resulting from workload misconfigurations. Containers are now the primary machine context and containerization means application workloads are highly distributed using any number of nodes—any of which can be physically located across different cloud environments.
The challenge for cloud-native platform teams is to manage an increasing volume of machine identities and deliver a minimum level of container security. Machine identity management is therefore fundamental to modern container security methods. The most common example of a cloud-native machine identity is a digital certificate based on the common X.509 standard.
These certificates allow encrypted data to flow between applications across cloud-native environments using a public or a private certificate authority to exchange keys and authenticate access and allow traffic to flow securely. In a Kubernetes context, X.509 digital certificates are essential to allow developers to deploy workloads that run inside production clusters, yet certificates that are misconfigured with workloads and which remain unobserved inside the cluster are a primary cause of these security vulnerabilities.
Automate the Process
For developers, speed is everything; the process of creating digital certificates to deploy workloads must be automated. Fast-moving developer-led environments where workloads are deployed on ever-faster release cycles using TLS and mTLS certificates to protect workloads are near-impossible to manage without automation.
Developer workloads on cloud-native infrastructure are, in many cases, now deploying multiple times an hour. Each workload must be deployed using digital certificates from a recognized private or public certificate authority. As developers increasingly leverage automation, the developer process generates higher numbers of certificates to use with workloads as they are deployed into production. Therefore, as companies adopt more cloud-native infrastructure and developer teams automate workloads on faster release cycles, certificate management becomes foundational to Kubernetes security.
Given this high level of certificate volume for companies that are using Kubernetes, it is fair to say that certificate misconfiguration is most likely the number-one security vulnerability for companies using Kubernetes in production. Since developer automation is the principal driver of certificate volume, having visibility and control of certificate configurations inside the cluster is surely essential to giving these companies the means to eliminate misconfiguration as the primary security vulnerability.
The CNCF’s cert-manager project is an industry success story and a great example of what happens when developers are the primary adopters of open source technologies used for tooling to improve automation. Now downloaded over one million times a day, cert-manager has quickly grown to become the developer’s choice and industry standard for cloud-native machine identity automation. Its success can be attributed cert-manager’s ability to allow developers to go faster by removing the manual and time-consuming process of assigning a TLS certificate to a workload that is ready to deploy to a Kubernetes cluster.
However, as teams automate machine identity management using open source tooling like cert-manager, the problem of poor visibility and control of all certificate configurations is even more apparent, especially with cloud-native infrastructure that is growing fast.
Increasing Certificate Volume
Workload complexity and certificate volume are increasing, so misconfiguration possibilities must be proactively managed in order to ensure the platform is free from inherent security vulnerabilities. Examples of these misconfiguration scenarios are: TLS web-facing certificate configurations; ensuring all intermediate CAs are using certificates with a validated and auditable chain of trust; securely managing the high volume of private certificates and ensuring they are all observable and configured correctly.
Cert-manager solves developers’ need for integrated tooling and automation—its huge popularity underlines this fact. But to better understand the need for effective visibility and control to proactively mitigate threats, deeper levels of visibility and control are needed to eliminate the security vulnerabilities that come from certificate misconfiguration.
Control planes are solutions that can give platform teams the ability to manage and analyze the status of certificate configurations. This can also give security teams the means to define policies that work with developer automation and mitigate risk.
A machine identity control plane built around cert-manager will work in this way. As a solution that is practical for both security and platform teams, it will cover their key needs by proactively identifying misconfigurations to allow fast remediation. In this sense, it becomes a solution for threat detection and avoidance and an integral part of enterprise security posture.
As a solution set up to provide in-depth visibility of all machine identities across multi-cluster infrastructure, a control plane solution will report and analyze certificate volume behaviors and monitor configurations. Its key value is in providing a means to allow new clusters to be created consistently and securely and providing alerts for when misconfigurations are detected with the relevant data relayed to SRE teams for fast remediation.
Also, by reporting the volume of certificate activity within clusters, it can analyze traffic patterns for both web-facing applications and internal workloads. This can help provide insights into traffic patterns covering standard pod-to-pod or service mesh traffic.
Since the overall cloud-native platform is optimized for threat prevention by eliminating security vulnerabilities and reducing outages from certificate misconfigurations, the company’s security audit capability is stronger. This allows the security team to provide consistent security policies for all developers and improves the developer experience, so developer productivity increases.