Web Flaws and Vulnerabilities

Handle the ReDoS (Regular Expression Denial of Service), or Evil Regex

Blog Web Flaws and Vulnerabilities Handle the ReDoS (Regular Expression Denial of Service), or Evil Regex
0 comments

You know I spend my time dissecting vulnerabilities, whether they’re simple or require a twisted mind to cause harm. Today, we’re talking about a sneaky attack—a simple coding mistake. Buckle up, because we’re diving into ReDoS, the nightmare of regular expressions (Regex) for developers.

Anatomy of Regular Expression Denial of Service (ReDoS)

But what is ReDoS, really?

When you install a security plugin like SecuPress, you expect it to block attacks, hackers, brute force attempts, and prevent account theft. But sometimes, the attack comes from a place many developers consider harmless: **regular expressions**, or **Regex**.
ReDoS (Regular Expression Denial of Service) is a denial-of-service attack that exploits the fact that most Regex implementations can reach extreme situations, forcing your system to work extremely slowly—often exponentially relative to the input size. All an attacker needs to do is send a carefully crafted input (and it can be so simple…) to make your system spiral out of control and freeze for a very long time.

Imagine a simple piece of text in your code bringing any server to its knees. Unthinkable? Or not? Read on…

The Naive Engine: The Root of Evil

To understand why a regular expression can become a hacking weapon, we need to dive a little into the Regex engine.
Most regex engines today use a so-called “naive” algorithm. This algorithm builds a Non-Deterministic Finite Automaton (NFA), which is a finite state machine where, for each state/input symbol pair, there can be multiple possible next states. (Still with me?)

To find a match, the deterministic algorithm tries one by one all possible paths, if necessary, until a match is found or all paths have failed.

The feature that causes this chaos is backtracking. If the input doesn’t match, the engine backtracks to previous positions where it could have taken a different path. It retries, over and over, until all possible paths have been explored.

Take the classic example: the Regex pattern ^(a+)+$. For the input aaaaX, there are 16 possible paths. So far, so good. But for the input aaaaaaaaaaaaaaaaX (16 ‘a’s), there are now 65,536 possible paths! And guess what? The number doubles with each additional ‘a’. It’s exponential!

Even if not all regular expressions are “expanded” with special additions, the naive algorithm is often used anyway because engines also handle complex cases, and that’s what creates the problem.

Recognizing the “Evil Regex”

A regular expression is called an Evil Regex if it can get stuck on a specially crafted input.
Evil Regexes usually contain these elements, often combined:

  1. A grouping with repetition.
  2. Inside the repeated group, you’ll find either:
    1. Another repetition.
    2. An overlapping alternation.

Here are some Evil Regex examples to avoid like the plague:

  • (a+)+$
  • ([a-zA-Z]+)*$
  • (a|aa)+$
  • (a|a?)+\$

These patterns are sensitive to inputs like aaaaaaaaaaaaaaaaaaaaaaa! (the minimum length may vary). I’ve already seen email validation expressions in online repositories that contained Evil Regexes, like the one from ReGexLib.

Risks and ReDoS Injection

If you use regular expressions to validate data on the client side, an attacker can assume the same vulnerable Regex is used on the server side. All they need to do is send a well-crafted input to crash your server.

The web is full of regular expressions, in every layer: browsers, Web Application Firewalls (WAF), databases, and web servers. An attacker can target any application by attempting an Evil Regex.

And beware, it can go even further with ReDoS Injection. If a Regex itself is affected by user input, the attacker can directly inject an Evil Regex.

Imagine a scenario where you check if the username is contained in the password to flag a weak password. If you create a new Regex based on that username, and the attacker enters an Evil Regex as the username and an explosive string as the password, your program—and your server—will freeze.

Protect Your PHP Applications from ReDoS Attacks

As a webmaster, you can counter these attacks by configuring two essential PHP settings: pcre.backtrack_limit and pcre.recursion_limit.

pcre.backtrack_limit limits the number of backtracking steps the Regex engine can perform. A value that’s too high allows ReDoS attacks to slow down or crash your server. The default value is 1 million, but it’s recommended to reduce it to 500,000 or even 100,000 in production.

pcre.recursion_limit limits the recursion depth in Regex evaluation, preventing stack overflows. The default value is 1,000, but you can reduce it to 500 or even 200 to block excessive recursive patterns.
Example of a secure configuration for WordPress in your wp-config.php or in an init hook:

// For a production site with a sensitive plugin
if ( defined( 'WP_ENVIRONMENT_TYPE' ) && WP_ENVIRONMENT_TYPE === 'production' ) {
    ini_set('pcre.backtrack_limit', 500000); // Reduced by half from default
    ini_set('pcre.recursion_limit', 500); // Reduced by half
    ini_set('max_execution_time', 30); // Limits global execution time
}

These values are a compromise between security and compatibility. They block most ReDoS attacks while allowing legitimate Regexes to function. However, reducing them too much may cause some complex Regexes to fail. Always test your applications after making changes.

How to Test the Impact on Your Application?

  1. Modify the values we just discussed, then enable in WP with a simple define( 'WP_DEBUG', true );.
  2. Test all your plugin features:
    1. Forms (emails, URLs, phones).
    2. Content parsing (shortcodes, tags).
    3. Data validation.
  3. Now check the logs for errors like:
    1. PHP Warning: preg_match(): Compilation failed: regular expression is too large at offset...
    2. PHP Fatal error: Maximum execution time exceeded

Alternative Solutions to Protect Against ReDoS

If you can’t reduce pcre.backtrack_limit without breaking your application, use these other strategies:

Replace Vulnerable Regexes with Parsers

For emails: Use filter_var(\$email, FILTER_VALIDATE_EMAIL) instead of a regex.
For URLs: Use wp_http_validate_url() or filter_var(\$url, FILTER_VALIDATE_URL).
For content parsing: Use libraries like PHP DOM or HTML Purifier or the WP HTTP API.

Use Atomic or Possessive Regex

Atomic Regex ((?>...)) or possessives (*+, ++, ?+) prevent too much backtracking :

// Before (vulnerable)
preg_match('/(a+)+/', $input);

// After (atomic, more secure)
preg_match('/(?>a+)+/', $input);

Limit the Size of User Inputs

if ( mb_strlen( $_GET['name'] ) > 255 ) {
wp_die( 'Name too long.', 403);
}

Use a Secure Wrapper for Regexes

function safe_preg_match($pattern, $subject, &$matches = [], $timeout_ms = 50) {
$start = microtime(true);
$result = preg_match($pattern, $subject, $matches);
$elapsed = (microtime(true) - $start) * 1000;

if ($elapsed > $timeout_ms) {
error_log("ReDoS detected: regex took {$elapsed}ms (pattern: {$pattern})");
return false; // Block the request
}
return $result;
}

Fix: Act Before the Heart Attack

The fundamental mistake is to let code become a code editor or to treat user data as if it’s innocent. Any data coming from a user must be treated as malicious—that’s Security 101, and I’ve said it a thousand times.

Regarding regular expressions: check your patterns! If you see nested repetitions or overlapping alternations, there’s a good chance you have a ticking time bomb in your code.

As a developer, you should read the OWASP documentation on ReDoS, use tools to detect these vulnerabilities (like SecuPress 2.5?!), and above all, don’t copy/paste Regexes found online (yes, even from Stack Overflow or ChatGPT) without understanding them, especially if they contain parts that OWASP has flagged as problematic. It’s always copy/think/paste.

Here are some of my Regexes that actually find these Evil Regexes. Run them on your own code and fix your vulnerabilities:

(a+)+
(a|aa)+
(.a){x} // If x > 10
([a-zA-Z]+)
(a|a?)+
^([a-zA-Z0-9._%+-]+)@([a-zA-Z0-9.-]+\.)+[a-zA-Z]{2,6}$ // email validation

Getting back to WordPress, PHP evolution is here to help. For example, PHP8 removed functions like create_function, which were often used by old malware that I still have in my files.

Using up-to-date PHP versions is a way to get rid of known vulnerabilities. Similarly, to prevent injections, like the SQLi seen in the example of a custom shortcode from a previous article, the fix is very simple.

Whether it’s SQL injection, XSS, or ReDoS, security comes down to distrusting user input and using dedicated tools.


Everyone is counting on you.

0 comments