Why Let's Encrypt Auto-Renewal Fails Sometimes

Automating SSL certificate renewal with Let's Encrypt through an ACME client like Certbot is an invaluable tool for managing the burden of system administration. However, even with automated setups, renewal failures and expired certificates do happen. I'll cover most of the common pitfalls below and, crucially, how to prevent them.

Reason 1. Misconfigured Cron Jobs or Systemd Timers

Certbot relies on cron jobs or systemd timers (depending on your system) to handle automatic renewals. These are known to fail silently, either because they were never configured in the first place, or because they were accidentally modified by something like a system update.

Solution: Periodically confirm scheduled tasks:

systemctl list-timers certbot.timer
# or
crontab -l | grep certbot

Note that some distributions, like Amazon Linux, ship without a cron daemon, which will silently cause cron-based automated renewals to fail. I found this out the hard way.

You can install a cron daemon like cronie using the appropriate command for your distribution:

sudo yum install cronie

Reason 2. Permission Issues

Certificate renewal requires Certbot to read and write certain directories. It's easy to inadvertently lock down a directory during day to day system administration.

Solution: Ensure proper permissions for Certbot directories:

sudo chmod -R 755 /etc/letsencrypt/

Reason 3. Not Serving the acme-challenge File Over HTTP

When Let's Encrypt tries to auto-renew your SSL certificate, it performs a validation check by making a regular HTTP request for a file called acme-challenge in the .well-known directory at the root of your domain to verify your control over that domain. If your server isn't configured to serve plaintext files over regular HTTP at that location, renewal will fail (assuming you're using the more common http challenge method, rather than the dns challenge.)

Solution: Make sure the directory from which you serve files, e.g. /var/www for Apache, is able to have a .well-known directory created while the server is running, and that your server will successfully serve a file in that directory over regular HTTP. If you generally redirect all HTTP traffic to HTTPS, as you should, you'll have to make an exception for that particular directory.

You can test your configuration using Certbot's --dry-run option:

sudo certbot renew --dry-run

Reason 4. Network Configuration Issues

Network misconfigurations, firewall rules, or server security groups blocking Let's Encrypt's validation requests will cause renewal attempts to fail. This is more likely to be an issue on a corporate or otherwise private network, but you won't be able to renew your certificates with the normal http challenge procedure if Let's Encrypt's servers can't reach yours.

Solution: Let's Encrypt does not publish a fixed list of IP addresses for their validation servers because these addresses change frequently, so you have to configure your network to allow inbound HTTP traffic on port 80 from any address. If stricter rules are needed, consider using the DNS-based validation method (though this can be more difficult to automate.) Again, a --dry-run can be used to verify whether an actual renewal can succeed.

Preventing Future Failures

Even the best-configured setups will experience occasional issues. Proactive certificate monitoring and alerting are crucial. Always use specialized SSL monitoring tools such as CertNotifier to receive alerts well before renewal issues affect your service. Don't let automated renewals lull you into a false sense of security.