The Regular expression Denial of Service (ReDoS) is a class of denial of service attacks that exploit vulnerable regular expressions (regexes) whose execution time can be superlinearly related to input sizes. A common approach of defending ReDoS attacks is to repair the vulnerable regexes. Techniques have been recently proposed to synthesize repaired regexes using program-by-example (PBE) techniques. However, there is no guarantee that a synthesized regex is semantically equivalent or similar to its original regex. Further, the invulnerability of the repaired regex cannot be guaranteed.
To address the challenges, we propose RegexScalpel, an automatic regex repair framework that adopts a localize-andfix strategy. RegexScalpel first localizes the vulnerabilities by leveraging fine-grained vulnerability patterns proposed by us to analyze their vulnerable patterns, the source (i.e., the pathological sub-regexes), and the root causes (e.g., the overlapping sub-regexes). Then, RegexScalpel targets to fix the pathological sub-regexes according to our predefined repair patterns and the localized vulnerability information. Furthermore, our repair patterns ensure that the repair regexes are semantically either equivalent to or similar to the original ones. Our iterative repair methods also guarantee the invulnerability of the repaired regexes. With an experiment on a total number of 448 vulnerable regexes, we demonstrate that RegexScalpel can outperform all existing automatic regexes fixing techniques by fixing 348 more regexes than the best existing work. Also, we adopted RegexScalpel to detect ten popular projects including Python and NLTK, and revealed 16 vulnerable regexes. We then applied RegexScalpel to successfully repair all of them, and these repairs were merged into the later release by the maintainers, resulting in 8 confirmed CVEs.