Stale container images can break TLS in quiet ways

A container image can look healthy and still be carrying old risk.

The app starts. The pod is ready. The dashboard is green. Nothing looks broken until the service tries to call an API and TLS fails.

Then the logs start saying things like:

x509: certificate signed by unknown authority

or:

SSL certificate problem: unable to get local issuer certificate

or, in Java:

PKIX path building failed

It is easy to blame the service on the other side. Maybe their certificate changed. Maybe the load balancer is wrong. Maybe DNS is pointing somewhere strange.

Sometimes that is true.

But sometimes the problem is sitting inside our own image.

Not the application code. Not the Kubernetes deployment. The image itself. More specifically, the CA certificates inside it.

Containers freeze more than code

I think this is one of the small container lessons that is easy to forget.

A container image is a snapshot. When we build it, we freeze a set of files, packages, libraries and certificates at that point in time. If the image is not rebuilt, those things do not quietly update themselves later.

That includes the trust store.

On Debian-based images, the ca-certificates package helps maintain the trusted certificates under /etc/ssl/certs and builds the combined certificate bundle. Alpine has its own ca-certificates package. Other base images and runtimes have their own details.

The detail matters less than the habit.

If the image is old, the trust store may be old too.

Root certificates expire. Certificate authorities change chains. New roots get added. Old ones get removed. Vendors rotate intermediates. Internal platforms move from one private CA to another. The outside world keeps changing while the image stays exactly as it was.

That is when a container can behave like an old laptop nobody has patched.

Why this can surprise teams

The confusing part is that the application may not have changed.

The same image worked last week. The same endpoint worked yesterday. The same code passed tests.

Then one morning an outbound call fails.

The team checks networking. The network looks fine. They check the remote certificate. It looks valid. They check recent application changes. There may not be any.

Meanwhile, the image may still have a CA bundle from months ago.

This is why I do not like treating container images as something we build once and forget. Immutability is useful, but it does not mean the contents remain safe or current. It only means they remain unchanged.

Sometimes unchanged is the problem.

The Let’s Encrypt lesson

A good public example is the Let’s Encrypt DST Root CA X3 expiry in 2021.

Most modern systems handled the change because they had updated trust stores and understood the newer ISRG Root X1 chain. Older clients and devices had problems because they did not have the right trust material or handled the chain badly.

Containers can fall into the same pattern.

A stale image can behave like an old client. It may reject a certificate chain that a fresh system accepts without issue.

That is not just a certificate problem. It is an image lifecycle problem.

Pinning helps, but it needs ownership

There is also a trade off with pinned base images.

Pinning a base image by digest is good for repeatability. It means you know exactly what you built from. That matters for supply chain control and auditability.

But pinned does not mean maintained.

If a Dockerfile points to a specific digest and nobody updates that digest, the service is now tied to that old base image forever. It will not benefit from refreshed packages or updated CA certificates until someone deliberately moves it forward.

So I do not see the choice as “pin or do not pin”.

The better question is: who owns the refresh?

If we pin, we need a routine to review and update the pin. If we use tags, we still need builds that pull fresh base images. In both cases, we need a process. Hope is not a patching strategy.

What I would check

If I was looking at a containerised service that suddenly started throwing TLS errors, I would ask a few simple questions.

When was this image last rebuilt?

Was the build using a fresh base image, or did it reuse a cached one?

Is the base image pinned to a digest?

If it is pinned, when was that digest last reviewed?

Is ca-certificates installed in the image?

Does the runtime use the OS trust store, or does it carry its own trust store?

Are there private CA certificates involved?

Are those private certificates copied into the image, mounted at runtime, or injected some other way?

Has anything changed in the external certificate chain?

Those questions are not glamorous. They are practical.

They slow the investigation down enough to avoid jumping straight to the wrong fix.

The fix should not be disabling TLS checks

When TLS breaks, someone will eventually suggest turning verification off.

In curl, that is usually -k. In application code, it might be an option that sounds harmless in the moment. Something like “disable certificate validation” or “trust all certificates”.

That can help prove a theory during troubleshooting. It should not become the fix.

If the real issue is stale trust material, disabling verification trades an operations problem for a security problem. The service may start working again, but now it cannot properly tell who it is talking to.

That is not a good bargain.

A better routine

The better answer is boring, which is usually a good sign.

Rebuild images regularly, even when application code has not changed.

Use build settings that pull fresh base images instead of relying on local cache. For Docker builds, that can mean using --pull in the pipeline.

Scan images for old packages and stale base layers.

If base images are pinned, use a tool or routine that opens update work when the digest needs refreshing.

Add a small deployment smoke test for critical TLS paths.

Know how each runtime handles certificates. Java, Go, Python, Node.js and .NET do not always behave the same way. Some use the OS bundle. Some may use runtime-specific stores or libraries. Some can be changed with environment variables.

Document how private CA certificates are added and rotated.

None of this needs to be complicated. It just needs to exist.

The thing I want to remember

Containers did not remove the need for operating system maintenance. They changed where that maintenance happens.

Instead of patching a long-running server in place, we rebuild and redeploy images. That only works if the rebuild actually happens.

A stale CA bundle is not the loudest container risk. It will not always show up in a dashboard. It may not look urgent until a certificate chain changes and production starts failing outbound calls.

But it is real.

Security is not only about the dramatic controls. It is also about the small routines that keep systems current enough to keep trusting the right things.

Rebuild the image.

Refresh the base.

Check the trust store.

Do not wait for TLS to remind you.

Notes I checked while writing

Docker describes images as immutable snapshots and recommends rebuilding images regularly with updated dependencies. Their build guidance also recommends using --pull to check for fresh base images during builds.

Docker’s guidance on digest pinning is useful too. Pinning improves repeatability, but it can also opt you out of automatic base image updates unless you have a process to update the digest.

The Debian update-ca-certificates man page explains how the system updates /etc/ssl/certs and generates the combined ca-certificates.crt bundle.

The Let’s Encrypt DST Root CA X3 expiry is a useful reminder that trust chains change, and older clients or systems without updated trust stores can fail even when the server certificate is valid.

containerstlsca certificatesimage securityreliability