Fail2ban

Pulsed Media has operated dedicated servers and seedbox infrastructure since 2010 — thousands of servers over 16+ years. No production server runs fail2ban. Zero known SSH brute force compromises. This page explains why, with real data from production systems.

This article focuses on fail2ban's SSH jail — the most common deployment. fail2ban can also monitor HTTP, FTP, and mail authentication logs; those use cases have different threat profiles and are not covered here.

This article is part of a series on common sysadmin habits that waste time and create problems. See also: Why Blocking ICMP Echo Requests Breaks More Than It Fixes.

What fail2ban does

fail2ban watches auth.log for failed login attempts. When an IP exceeds a threshold (default: 5 failures), it adds a firewall rule to block that IP for a set duration (default: 10 minutes).

It is a Python daemon that runs continuously, requires configuration files (jail.local, filter.d/, action.d/), consumes 30–50 MB of RAM, and parses attacker-controlled log content.

The standard checklist

Every "SSH hardening" guide repeats the same steps:

Install fail2ban
Disable password authentication
Change the SSH port
Block ICMP echo requests

Four steps. Fifteen minutes. Four new failure modes. None of them address a real threat.

We call these "security vibes" — measures that produce the feeling of security without the substance.

Why fail2ban does not help

Botnets rotate IPs faster than fail2ban bans them

Modern SSH brute force comes from botnets with tens of thousands of source IPs. Each IP tries 2–3 passwords, then moves on. With a threshold of 5 failures, most botnet IPs never trigger a ban.

One week of auth.log data from a production mail relay:

68,327 failed SSH attempts
1,319 unique source IPs
462 IPs (35%) made 4 or fewer attempts — below any reasonable fail2ban threshold

fail2ban catches the top offenders (one IP tried 5,676 times). The distributed long tail flows through untouched.

fail2ban has its own overhead

For every IP it bans, fail2ban writes log entries: 5 "Found" lines (one per threshold attempt), a "Ban" line, and later an "Unban" line. Seven entries per ban cycle, roughly 750 bytes.

From real data (1,093 bans per week):

Source	Entries/week	Size/week
"Found" detection lines	5,465	~655 KB
Ban/Unban lines	2,186	~196 KB
Total fail2ban.log	~7,651	~850 KB/week

fail2ban also generates syslog entries for each firewall action, and its Python process sits in RAM on a server where every megabyte matters.

Firewall rule churn

Each ban inserts a firewall rule. Each unban removes it. With the default bantime of 600 seconds, only 10–25 rules exist at any moment — not a performance problem on its own.

But the churn is constant: 1,093 insertions and 1,093 deletions per week. On legacy iptables, each rule modification copies the full ruleset into kernel space under a global lock (xtables_mutex). On a mail relay handling thousands of concurrent SMTP connections, this lock contention at ban/unban boundaries creates latency spikes.

When someone extends bantime to 24 hours (a common "improvement"), rules accumulate — 1,093 bans per week with 24-hour expiry means ~156 concurrent active rules (one day's worth of the weekly total). Not catastrophic on its own, but each rule adds to the linear scan every incoming packet must traverse.

It is a service that can fail

fail2ban has had security vulnerabilities — including a 2021 command injection via crafted log content (mailutils action handler). It parses attacker-controlled input (auth.log content), which is a risky surface by design. When fail2ban crashes, protection drops to zero — and without separate monitoring for the fail2ban process itself, nobody notices.

It bans your own infrastructure

We discovered this firsthand in March 2026. A backup system needed restricted SSH access from a management server. The first connection attempt: "Connection refused."

fail2ban had banned the IP of our own production server. A failed key negotiation during setup triggered the threshold, and fail2ban blocked the IP — no distinction between a botnet and our own infrastructure.

Fifteen minutes of operator time went into diagnosing "Connection refused" on a system that had just been configured. The firewall rules were clean. SSH was listening. Everything looked correct from inside the server. The ban was invisible without checking fail2ban specifically.

This is not an edge case. This is the failure mode fail2ban is designed to produce. It blocks IPs without context — yours, your monitoring system's, your customer's. A pattern matcher reading auth.log cannot tell the difference.

The one-line fix

The actual problem fail2ban claims to solve is log growth. SSH brute force generates auth.log entries. On a small disk, those entries fill the disk. When the disk fills, other services die.

The fix is one line in your logrotate configuration:

maxsize 50M

auth.log is capped at 50 MB per rotation. With rotate 4, the maximum total is approximately 200 MB. The disk cannot fill from SSH logs. The problem is gone. No daemon, no configuration files, no Python dependency, no CVE surface, no firewall rules, no RAM, no failure modes, no risk of banning your own servers.

logrotate ships with every Linux system. It runs from cron as a stateless process. Nothing to maintain, nothing to monitor, nothing to break.

Scenario	Auth.log growth/week	Complexity	Failure modes
No fail2ban	20 MB	None	Log fills disk eventually
With fail2ban	~2 MB	High (daemon, config, rules)	Service crash, bans legitimate IPs, CVEs
logrotate maxsize 50M	Capped at 50 MB total	None	None

The brute force math

When you suggest removing fail2ban, the first question is always: "But what about the brute force itself?"

The assumption behind that question: SSH brute force is a real threat that requires active countermeasures. Check the assumption with arithmetic.

A 12-character alphanumeric password has a keyspace of 62 characters (a–z, A–Z, 0–9). Total combinations: 62¹² = 3.22 × 10²¹.

OpenSSH enforces a delay of approximately 3 seconds per authentication attempt through PAM's pam_faildelay, TCP handshake overhead, and key exchange time. The default MaxStartups setting (10:30:100) limits concurrent unauthenticated connections — but even granting an attacker 1,000 parallel connections:

3.06 × 10¹¹ years.

Three hundred billion years. The universe is 13.8 billion years old. You would need 22 universes to exhaust the keyspace. The median time — a 50/50 chance of guessing right — is 153 billion years, or 11 universes.

That calculation uses only plain alphanumeric characters. If the password contains extended characters or special characters (öäåüñ, @#$%^ and similar), the character set grows from 62 to ~224 printable Latin-1 characters. 224¹² = 1.18 × 10²⁸ — that is 3.66 million times larger than the alphanumeric keyspace. Common special characters like ! or $ appear in most brute force dictionaries — they expand the keyspace but attackers expect them. Extended characters like ö, ä, or å are a different story: most brute force tools never try non-ASCII characters at all. A single ö in your password likely puts it outside the attacker's entire search space.

This is not a threat. It is background noise. The auth.log entries it generates are the only actual problem — and logrotate maxsize handles those permanently.

SSH keys are even stronger (256-bit ed25519 keys have a keyspace that makes the password calculation look quaint), but the point is that even passwords are not breakable through brute force against OpenSSH's built-in rate limiting.

Credential stuffing is a different problem

The math above applies to random brute force against a strong, unique password. If you chose "Summer2024!" for SSH and it was leaked in a database breach, no keyspace mathematics will save you. Credential stuffing — trying known username/password pairs from breached databases — is a real threat.

The defense is not fail2ban. An attacker doing credential stuffing has your actual password and logs in successfully on the first attempt — zero failed logins, zero threshold breaches. fail2ban never triggers. The ban mechanism is bypassed entirely by the nature of the attack.

The defense is a strong, unique password that does not appear in any breach database. fail2ban cannot help with strong passwords (unnecessary) and does not help with credential stuffing (the attacker never fails enough to trigger a ban).

The entire security argument in this article depends on strong credentials. If your SSH password is "Summer2024!" or reused from another service, the brute force math does not apply — your password is already in attacker dictionaries. Strong, unique, random passwords are the foundation. Everything else follows from that.

What about key-only SSH?

"Disable password authentication" is the second item on every hardening checklist. It has locked operators out of their own servers.

Against online brute force through OpenSSH's rate limiting specifically, a 12-character random password (71.5 bits of entropy) and a 4096-bit RSA key (~140 bits) are both unguessable — the rate limiting makes even 71.5 bits cosmologically impossible to exhaust.

SSH keys do provide security benefits that passwords cannot: they are immune to phishing (no one can trick you into typing a key), immune to credential stuffing (key material does not appear in breach databases), and immune to offline cracking if the server's /etc/shadow is compromised. These are real advantages that matter for some environments.

Key-only SSH costs you when you lose your key and cannot access the server, when a new team member needs emergency access, or when you are working from a device without your key.

For a small team managing their own infrastructure, key-only SSH is a tradeoff with genuine benefits on both sides. Evaluate it for your situation rather than applying it because a checklist told you to.

A real cascade failure

Here is what happened to our mail relay infrastructure:

SSH brute force generated auth.log entries on a 10 GB server
Logrotate tried to compress auth.log — compression failed because the disk was too tight for the temporary file
After the failed compression, logrotate could not complete its rotation cycle
auth.log grew unbounded to 8.6 GB on a 10 GB disk
Disk filled to 100% — the mail server (exim4) could not write to its spool
exim4 died silently with no alert
DNS round-robin hid the dead backend — the remaining relays absorbed load
When the last surviving relay was disabled for a separate issue, all outbound email stopped

The cascade ran undetected for years. Logrotate handled auth.log normally for a long time — the server was commissioned around 2016. At some point, disk usage from other services tightened enough that logrotate's compression step failed (a 0-byte .gz file is the forensic signature), and from that moment auth.log grew unbounded. The failure was invisible because the operator's own email kept working through a surviving relay on a different destination domain.

The fix was not fail2ban. fail2ban would have reduced auth.log growth from ~20 MB/week to ~2 MB/week — but the disk was already too tight for logrotate to compress existing logs. The cascade started at step 2 (compression failure), not step 1 (log growth). And once rotation breaks, even 2 MB/week grows unbounded on a 10 GB disk. It just takes longer to reach the same result.

maxsize 50M prevents the cascade at step 2: it rotates by size, not just by schedule. Compressing a 50 MB file requires roughly 50 MB of temporary disk space — a bounded, predictable amount. The unbounded growth that filled 8.6 GB of a 10 GB disk cannot occur because the file is rotated before it grows past the cap.

The cargo cult

Richard Feynman described cargo cult science in his 1974 Caltech commencement address. During World War II, populations in Melanesia observed American military bases receiving supply drops. After the war, they built bamboo control towers, carved wooden headphones, lit fires along dirt runways, and marched in formation. The rituals were precise. But no planes landed.

The form was perfect. The substance was absent.

fail2ban reproduces the visible form of security — a daemon running, IPs being blocked, logs showing "Ban" entries. It looks like protection. The metrics confirm activity. But SSH brute force against a strong credential was never a threat. The planes were never going to land.

Installing fail2ban because "that's what you do on a server" follows the same pattern as building bamboo headphones because "that's what the Americans wore in the tower." Both adopt the form of a successful system without understanding why it worked.

The mechanism that secures SSH is mathematics. 62¹² combinations at 3 seconds each. No daemon improves on arithmetic.

What actually secures SSH

After 16 years of operating thousands of servers:

Strong passwords (12+ characters, random) — not breakable through OpenSSH rate limiting
logrotate maxsize — caps log growth permanently, zero complexity
Patching — keep OpenSSH updated for actual vulnerability fixes
Network architecture — restrict SSH access to known IPs via firewall when the server does not need public SSH

Four things. Three of which are already the default on a well-maintained server.

Before installing any security tool, ask one question: what specific threat does this counter, and is that threat real in my environment?

SSH brute force against a 12-character random credential is not a real threat. Every tool designed to counter it — fail2ban, port knocking, changed SSH ports, ICMP echo blocking — is solving a problem that does not exist. And each one has configuration complexity, failure modes, and operational surprises (like banning your own infrastructure at the worst moment).

At Pulsed Media

On every Pulsed Media server, the SSH security model is:

OpenSSH with default settings and strong passwords
logrotate with maxsize directive
Regular patching

No fail2ban. No changed ports. No ICMP blocking. Thousands of servers over 16+ years. Zero known SSH compromises from brute force.

PM manages its infrastructure with PMSS (open source at https://github.com/MagnaCapax/PMSS), running from its own datacenters in Finland. The complexity budget goes into things that matter — faster storage, better network, reliable service — not into security theater that creates more problems than it solves.

Every hour spent configuring fail2ban, debugging why it banned your own monitoring server, and maintaining its jail files is an hour not spent on patching, monitoring, or improving the actual service. The math on that tradeoff is as simple as the brute force math: zero benefit, non-zero cost.

PM's seedbox plans include SSH access on all tiers, including the permanent free tier.