Security

Certificate Management at Scale: Let's Encrypt, ACME, and cert-manager

Automate TLS certificates with Let's Encrypt, ACME protocol, and cert-manager in Kubernetes. Covers HTTP-01, DNS-01, wildcards, private CAs, and expiry monitoring.

A
Abhishek Patel9 min read

Infrastructure engineer with 10+ years building production systems on AWS, GCP,…

Certificate Management at Scale: Let's Encrypt, ACME, and cert-manager
Certificate Management at Scale: Let's Encrypt, ACME, and cert-manager

TLS Certificates Expire Whether You're Ready or Not

Every production outage caused by an expired TLS certificate is entirely preventable. Yet it keeps happening -- to banks, airlines, and cloud providers. The fix isn't "remember to renew." The fix is automation. Let's Encrypt proved that certificate issuance can be free and automated. ACME (Automatic Certificate Management Environment) is the protocol that makes it work. And cert-manager brings that automation into Kubernetes, where it handles issuance, renewal, and rotation without human intervention.

If you're manually managing certificates in 2026, you're doing it wrong. This guide covers how ACME works, how to deploy cert-manager in Kubernetes, and how to monitor certificate expiry so you never wake up to a "Your connection is not private" page again.

What Is ACME?

Definition: ACME (Automatic Certificate Management Environment) is a protocol for automating the issuance, renewal, and revocation of TLS certificates. Defined in RFC 8555, it allows a client to prove domain ownership to a Certificate Authority (CA) through automated challenges, then receive a signed certificate -- all without human interaction.

Let's Encrypt popularized ACME by offering free, automated, 90-day certificates. But ACME isn't limited to Let's Encrypt. Other CAs like ZeroSSL, Google Trust Services, and Buypass support ACME. You can also run an internal ACME CA with step-ca for private PKI.

How ACME Domain Validation Works

Before a CA issues a certificate, it needs proof that you control the domain. ACME defines several challenge types:

HTTP-01 Challenge

  1. Your ACME client requests a certificate for example.com.
  2. The CA provides a random token.
  3. Your client places the token at http://example.com/.well-known/acme-challenge/{token}.
  4. The CA's validation servers fetch that URL from multiple network vantage points.
  5. If the token matches, the CA issues the certificate.

HTTP-01 is the simplest challenge type. It works for any publicly accessible web server. The limitation: it only works for port 80, can't issue wildcard certificates, and requires your server to be reachable from the internet during validation.

DNS-01 Challenge

  1. Your ACME client requests a certificate for *.example.com.
  2. The CA provides a token value.
  3. Your client creates a TXT record at _acme-challenge.example.com with the token.
  4. The CA queries DNS for that TXT record.
  5. If the record matches, the CA issues the certificate.

DNS-01 is the only way to get wildcard certificates. It also works for servers that aren't publicly accessible. The trade-off: it requires programmatic access to your DNS provider's API, and DNS propagation can add latency to the validation process.

ChallengeWildcard SupportRequires Public AccessComplexityBest For
HTTP-01NoYes (port 80)LowSimple web servers, ingress controllers
DNS-01YesNoMediumWildcards, private infrastructure
TLS-ALPN-01NoYes (port 443)MediumEnvironments where port 80 is blocked

cert-manager: Automated Certificates in Kubernetes

cert-manager is the standard tool for managing TLS certificates in Kubernetes. It watches for Certificate resources and Ingress annotations, then uses ACME (or other issuers) to obtain and renew certificates automatically.

Installing cert-manager

# Install cert-manager with CRDs
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.16.0/cert-manager.yaml

# Verify the installation
kubectl get pods -n cert-manager
# cert-manager, cert-manager-cainjector, cert-manager-webhook should all be Running

Setting Up a Let's Encrypt Issuer

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: admin@example.com
    privateKeySecretRef:
      name: letsencrypt-prod-key
    solvers:
      - http01:
          ingress:
            ingressClassName: nginx

Pro tip: Always test with the Let's Encrypt staging server first (https://acme-staging-v02.api.letsencrypt.org/directory). The staging server has generous rate limits and issues untrusted certificates for testing. The production server has strict rate limits -- 50 certificates per registered domain per week -- that can lock you out during debugging.

Requesting a Certificate

There are two ways to get certificates with cert-manager:

Option 1: Certificate resource

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: example-com-tls
  namespace: production
spec:
  secretName: example-com-tls-secret
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
  dnsNames:
    - example.com
    - www.example.com

Option 2: Ingress annotation (simpler)

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: example-ingress
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - example.com
        - www.example.com
      secretName: example-com-tls-secret
  rules:
    - host: example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: web-app
                port:
                  number: 80

cert-manager sees the annotation, creates a Certificate resource automatically, completes the ACME challenge, and stores the certificate in the specified Secret. When the certificate is 30 days from expiry, cert-manager renews it automatically.

DNS-01 for Wildcards with cert-manager

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-dns
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: admin@example.com
    privateKeySecretRef:
      name: letsencrypt-dns-key
    solvers:
      - dns01:
          cloudflare:
            apiTokenSecretRef:
              name: cloudflare-api-token
              key: api-token

cert-manager supports DNS providers including Cloudflare, Route53, Google Cloud DNS, Azure DNS, and DigitalOcean. For unsupported providers, use a webhook solver.

Certificate Transparency Logs

Every publicly trusted certificate is logged in Certificate Transparency (CT) logs -- append-only, cryptographically verifiable ledgers. This means:

  • You can monitor CT logs to detect unauthorized certificates issued for your domains.
  • Anyone can see what certificates exist for your domain (subdomains are visible).
  • Misissued certificates are detectable and attributable to the issuing CA.

Use tools like crt.sh or SSLMate's Certspotter to monitor certificates for your domains. If someone obtains a certificate for your domain without authorization, CT logs are how you'll find out.

Private CAs and Internal Certificates

Not every certificate needs to be publicly trusted. Internal services, development environments, and mTLS setups use private CAs. Options include:

ToolTypeCostBest For
step-ca (Smallstep)Private ACME CA (OSS)Free / commercial supportInternal ACME automation, short-lived certs
HashiCorp Vault PKICertificate engineFree (OSS) / EnterpriseDynamic certificates, Vault integration
AWS Private CAManaged private CA$400/mo per CAAWS-native, compliance requirements
cfssl (Cloudflare)PKI toolkit (OSS)FreeSimple CA operations, signing
EJBCAEnterprise CA (OSS/commercial)Free / commercialFull PKI lifecycle, compliance

Watch out: AWS Private CA costs $400/month per CA. That's fine for enterprises, but it's shocking for startups. step-ca or Vault PKI gives you the same functionality at infrastructure cost only. Evaluate whether you need a managed service or can run your own.

Monitoring Certificate Expiry

Automated renewal should handle certificate rotation, but monitoring is your safety net. Here's a Prometheus-based approach:

# Prometheus rule to alert on certificates expiring within 14 days
groups:
  - name: certificate-expiry
    rules:
      - alert: CertificateExpiringSoon
        expr: |
          (certmanager_certificate_expiration_timestamp_seconds - time()) / 86400 < 14
        for: 1h
        labels:
          severity: warning
        annotations:
          summary: "Certificate {{ $labels.name }} expires in {{ $value | humanizeDuration }}"
          description: "Certificate in namespace {{ $labels.namespace }} is expiring soon. Check cert-manager logs for renewal issues."

cert-manager exports Prometheus metrics out of the box. Key metrics to monitor:

  • certmanager_certificate_expiration_timestamp_seconds -- When each certificate expires.
  • certmanager_certificate_ready_status -- Whether each certificate is in a Ready state.
  • certmanager_certificate_renewal_timestamp_seconds -- When the last renewal occurred.

Common Pitfalls

Let's Encrypt Rate Limits

Production rate limits are strict: 50 certificates per registered domain per week, 5 duplicate certificates per week, 300 new orders per account per 3 hours. Test with the staging server. Use wildcard certificates to reduce the number of issuances. Cache and reuse certificates across deployments.

DNS Propagation Delays

DNS-01 challenges can fail if the TXT record hasn't propagated to all of Let's Encrypt's validation servers. cert-manager has a dns01RecursiveNameservers option to specify nameservers for propagation checks. Set propagation timeouts generously.

Ingress Controller Restart Loops

If cert-manager can't complete the ACME challenge (e.g., the ingress controller isn't routing /.well-known/acme-challenge/ correctly), the Certificate resource enters a failed state. Check the CertificateRequest, Order, and Challenge resources for debugging:

kubectl describe certificate example-com-tls -n production
kubectl describe order -n production
kubectl describe challenge -n production

Frequently Asked Questions

How long are Let's Encrypt certificates valid?

Let's Encrypt certificates are valid for 90 days. This short validity period encourages automation and limits the impact of key compromise. cert-manager renews certificates when they're 30 days from expiry by default, giving you a 60-day window of automatic operation before manual intervention is needed.

Can I get wildcard certificates with Let's Encrypt?

Yes, but only through DNS-01 validation. HTTP-01 cannot validate wildcard domains because there's no specific URL path for *.example.com. You need programmatic access to your DNS provider's API. cert-manager supports DNS-01 solvers for all major providers.

What happens if cert-manager fails to renew a certificate?

The existing certificate continues to work until it expires. cert-manager retries renewal with exponential backoff. You should have Prometheus alerts on certmanager_certificate_expiration_timestamp_seconds to catch renewal failures with at least 14 days of lead time. Check Certificate, Order, and Challenge resources for error details.

Should I use HTTP-01 or DNS-01 validation?

Use HTTP-01 if your server is publicly accessible and you don't need wildcards -- it's simpler to set up. Use DNS-01 if you need wildcard certificates, your server isn't publicly accessible, or port 80 is blocked. DNS-01 requires more setup (DNS provider API credentials) but is more flexible.

How do I handle certificates for internal services?

Internal services don't need publicly trusted certificates. Run a private CA using step-ca or Vault PKI. cert-manager can use these as issuers, giving you the same automated lifecycle management. Distribute your private CA's root certificate to all clients that need to trust these internal certificates.

What are Certificate Transparency logs?

CT logs are public, append-only ledgers that record every publicly trusted certificate. They allow domain owners to detect unauthorized certificates, browsers to verify that certificates have been logged, and researchers to audit CA behavior. Monitor CT logs for your domains using crt.sh or Certspotter to catch misissued certificates.

How much does automated certificate management cost?

Let's Encrypt and cert-manager are both free. Your costs are infrastructure (running cert-manager in Kubernetes) and DNS provider API access (which is typically included in your DNS hosting). The main paid option is AWS Private CA at $400/month for private certificates. For most public-facing services, the total cost of automated certificate management is effectively zero.

Automate Everything, Trust Nothing Manual

Certificate management is a solved problem in 2026. Let's Encrypt provides free certificates. ACME automates the validation and issuance process. cert-manager integrates this into Kubernetes with zero ongoing manual effort. The only remaining job is monitoring -- set up Prometheus alerts on expiry timestamps and renewal status, and you'll catch problems weeks before they become outages. If you're still manually renewing certificates, the next outage is just a matter of time.

A

Written by

Abhishek Patel

Infrastructure engineer with 10+ years building production systems on AWS, GCP, and bare metal. Writes practical guides on cloud architecture, containers, networking, and Linux for developers who want to understand how things actually work under the hood.

Related Articles

Enjoyed this article?

Get more like this in your inbox. No spam, unsubscribe anytime.

Comments

Loading comments...

Leave a comment

Stay in the loop

New articles delivered to your inbox. No spam.