Troubleshooting cert-manager in Kubernetes

It’s well known that thanks to Let's Encrypt we get TLS certificates for free. An obvious requirement for any Kubernetes deployment is to use certificates, especially when we expose our traffic externally.

In the Kubernetes land we achieve this with cert-manager, besides their documentation is excellent. Nevertheless, I ecountered a few problems, the majority due to small misunderstandings.

ClusterIssuer vs Issuer

The key difference is that an Issuer is scoped within a namespace whereas a ClusterIssuer is global. In case your ingress resource references an Issuer that belongs to another namespace, you’ll probably see your certificate’s status like this:

NAME                    READY   SECRET                  AGE
nginx-nvme0-net-tls     False   nginx-nvme0-net-tls   14d

It’s worth to mention this is not the only reason.

Ingress without a DNS record

We just noticed our first mistake, now we are using the proper ClusterIssuer yet we still see the same error. This is most probably because the DNS record for the domain you want the certificate does not exist. You can verify this by checking the logs of the cert-manager pod.

$ k logs -f pod/cert-manager-7747db9d88-kl7wr  -n cert-manager

E0719 16:31:03.769101       1 sync.go:185] cert-manager/controller/challenges "msg"="propagation check failed" "error"="failed to perform self check GET request 'http://nginx.nvme0.net/.well-known/acme-challenge/nwq943WCLa0x8P22f3Vz4LPwn9BwDhv6qAHTazG0ic4': Get \"http://nginx.nvme0.net/.well-known/acme-challenge/nwq943WCLa0x8P22f3Vz4LPwn9BwDhv6qAHTazG0ic4\": dial tcp: lookup nginx.nvme0.net on 10.32.0.10:53: no such host" "dnsName"="nginx.nvme0.net" "resource_kind"="Challenge" "resource_name"="nginx-nvme0-net-tls-3020107151-2491637906-3946730861" "resource_namespace"="apps" "type"="http-01" 
I0719 16:31:03.769191       1 controller.go:147] cert-manager/controller/challenges "msg"="finished processing work item" "key"="apps/nginx-nvme0-net-tls-3020107151-2491637906-3946730861

Subject Alternative Name

The example in the cert-manager site is good, however it may lead to misunderstandings. Defining an Ingress for www.example.com requires that your tls hosts definition includes it as well.

apiVersion: extensions/v1beta1
 kind: Ingress
 metadata:
   annotations:
     kubernetes.io/ingress.class: nginx
   name: example
   namespace: foo
 spec:
   rules:
     - host: www.example.com
       http:
         paths:
           - backend:
               serviceName: exampleService
               servicePort: 80
             path: /
   tls:
       - hosts:
           - example.com
         secretName: example-tls

In order to make th ingress functional, the tls should look like this:

   tls:
       - hosts:
           - www.example.com
           - example.com
         secretName: example-tls

TLS keys

When you delete a certificate, you must also delete the tls secret that comes with it. The reason behind is that the CSR was created for a particular private key.

It turns out the secret already exists, therefore the creation is skipped. It’s quite possible that if you see an error like the one below, you forgot to remove the tls secret:

$ k logs cert-manager-7747db9d88-kl7wr -n cert-manager
I0701 21:02:25.808915       1 controller.go:141] cert-manager/controller/certificates "msg"="syncing item" "key"="default/nvme0-net" 
I0701 21:02:25.809220       1 sync.go:386] cert-manager/controller/certificates "msg"="validating existing CSR data" "related_resource_kind"="CertificateRequest" "related_resource_name"="nvme0-net-65926366" "related_resource_namespace"="default" "resource_kind"="Certificate
" "resource_name"="nvme0-net" "resource_namespace"="default" 
I0701 21:02:25.809448       1 sync.go:406] cert-manager/controller/certificates "msg"="stored private key is not valid for CSR stored on existing CertificateRequest, recreating CertificateRequest resource" "related_resource_kind"="CertificateRequest" "related_resource_name"
="nvme0-net-65926366" "related_resource_namespace"="default" "resource_kind"="Certificate" "resource_name"="nvme0-net" "resource_namespace"="default" 

Conclusion

At the beginning may feel overwhelming, nevertheless do not give up and remember we always have kubectl logs for debugging.