5 Evaluation

In this section, we evaluate RegexScalpel on a wide range of ReDoS-vulnerable regexes collected from the SOLA-DA benchmark and real-world CVEs. We implemented RegexScalpel in Java, and conducted experiments on a machine with 16 cores Intel Xeon CPU E5620 @ 2.40GHz with 12MB Cache, 24GB RAM, running Windows 10. For all the experiments described below, we set the repair time budget for each vulnerability to 5 minutes and the small quantifier nμ to 500. Our evaluation aims to answer the following four research questions (RQs):

RQ1. Can RegexScalpel outperform state-of-the-art regex defense techniques? We evaluate the effectiveness of RegexScalpel on ReDoS-vulnerable regexes from the SOLA-DA benchmark and ReDoS-related CVEs, comparing with five state-of-the-art ReDoS defense techniques. These techniques vary from regex engine substitution, input length limit and regex repair.
RQ2. Can RegexScalpel outperform handcrafted defense actions? We compare the repaired regexes synthesized by RegexScalpel with the defense actions handcrafted by project maintainers.
RQ3. Can RegexScalpel detect new ReDoS vulnerabilities and synthesize repairs of usefulness to maintainers? We explore the usefulness of RegexScalpel on popular real-world projects in the detection of new ReDoS vulnerabilities and the synthesis of valid repairs useful to the project maintainers.

5.1 Evaluation Setup

5.1.1 Evaluation Datasets

Our evaluation was conducted on the ReDoS-vulnerable regexes collected from two widely-used sources: (i) the SOLA-DA benchmark and (ii) real-world CVEs. The statistics are listed in Table 9. The first benchmark SOLA-DA was constructed by Staicu and Pradel. It consists of 34 ReDoS-vulnerable regexes, and is often used for research on finding, fixing, and mitigating ReDoS vulnerabilities. For the second benchmark, we collected ReDoS-vulnerabilities in the last three years from widely-used libraries with Common Vulnerabilities and Exposures (CVE) identifiers. Collected vulnerabilities without clear descriptions, live links, or test cases were discarded, resulting in 70 CVEs in total. Moreover, one CVE may contain more than one vulnerable regex. Finally, we extracted 414 ReDoS-vulnerable regexes from these 70 CVEs.The evaluation dataset is shown below.

Repair ReDoS Benchmarks

5.1.2 Evaluation Approaches

To answer RQ1, we selected three state-of-the-art tools belonging to three paradigms, i.e., regex engine substitution (RE2), input length restriction (LLI), and regex repair (FlashRegex). We implemented three variants of input length restriction to limit the length of the input to 100, 500, and 5000, which are denoted as LLI(100), LLI(500), and LLI(5000)), respectively. Since RE2 and LLI do not defend ReDoS vulnerabilities by regex repair, we use the term “defense” instead of “repair” when referring to the evaluation results in RQ1. To answer RQ2, we compared RegexScalpel with the project maintainers.

5.1.3 Evaluation Metrics

A defense is considered successful if it (i) passes all the given test cases, and (ii) is free from ReDoS attack. The success defense rate is calculated by dividing the number of successful defenses by the total number of vulnerable regexes under defense.

5.2 RQ1: Comparing State-of-the-art

Overall Results. We evaluated the effectiveness of RegexScalpel on 448 ReDoS-vulnerable regexes comparing with different defense tools, i.e., RE2, LLI with different input length limits, and FlashRegex. Table 10 shows the number (ratio) of regexes that have been defended successfully. The detailed results can be found in Table 20 and Table 21. We can see that across the two benchmarks, the effectiveness of RegexScalpel outperforms existing works, defending over 98% vulnerable regexes, over four times of the best results (FlashRegex with 21.20%) achieved by baselines.

In the following, we further investigated the reason why existing works performed unsatisfactorily, and thus revealed the advantage of RegexScalpel from two perspectives.

Defense Capabilities on Vulnerable Pattern SLQ . We analyzed the distribution of four vulnerable patterns over all vulnerable regexes, and observed that the pattern SLQ is the major cause of the vulnerability, accounting for 85.29% (29 / 34) and 75.12% (311 / 414) ReDoS vulnerabilities in the two benchmarks, respectively. We further analyzed the results of the tools for their defending vulnerabilities with respect to SLQ pattern. The statistics is shown in Table 11. We note that the replacement of regex engines (i.e., RE2) cannot help vulnerable regexes with SLQ pattern to defend attacks in both benchmarks. By limiting input length (i.e., LLI), the success defense rate ranges from 5.79% to 89.66%, which is still not satisfactory. In addition, FlashRegex, the state-of-the-art regex defense tool, cannot defend regexes with SLQ vulnerable pattern, resulting in 0% success defense ratio in both benchmarks. In contrast, RegexScalpel can successfully defend 99.41% (338 over 340) regexes with SLQ pattern, making them free from ReDoS attacks.

The reasons behind the different effectiveness on regexes with SLQ vulnerable pattern are explained as follow. The design of RE2 and FlashRegex do not consider the SLQ pattern, leaving them incapable of defending vulnerable regexes with such vulnerable patterns. On the other hand, works that limit the input length (LLI) are agnostic to the vulnerable pattern. Yet, though LLI can defend 26 over 29 vulnerabilities in the first benchmark, falls short in the second benchmark, with only at most 5.79% (18 / 311) vulnerable regexes that can be free of attack. The differences between the two benchmarks could be attributed to the reason that there are only 7 (20.58%) regexes with multiple vulnerabilities in the first benchmark, while there are 191 (46.14%) such regexes in the second benchmark (see Table 15 for details). In other words, though LLI is agnostic to the vulnerable patterns, it is less incapable of handling regexes with multiple vulnerabilities.

Defense Capabilities on Lookarounds and Backreferences. The versatility of the rich extended features is also a reason that sets existing tools back. Among the extended features, we observed two features (i.e., lookarounds and backreferences) are more dominant than other features. So we investigated the effectiveness on regexes with these two features, as shown in Table 12. There are 7 and 32 regexes using lookarounds or backreferences in the two benchmarks, respectively. Existing tools can only defend at most 7 (17.95%) of them, while our RegexScalpel can successfully defend 92.31% (36 over 39). The table shows the existing tools rarely consider handling vulnerable regexes using lookarounds or backreferences, resulting in the unsatisfactory defense capabilities for these sorts of vulnerabilities.

Summary to RQ1: RegexScalpel can effectively defend 98.88% of vulnerable regexes, compared with 21.20% achieved by the best work. The high defense capability of RegexScalpel benefits from the comprehensive consideration of all four vulnerable patterns. In addition, the design of RegexScalpel is agnostic to the extended features like lookarounds and backreferences, making RegexScalpel can defend vulnerable regexes regardless of their features.

5.3 RQ2: Comparing Maintainers’ Repairs

Overall Results. Apart from the automatic defense tools shown above, we compared the repairs made by RegexScalpel with the defending actions taken by the project maintainers. Defending actions can be regex engine substitution, input length limit, regex repair, or code logic modification. We showed the statistics of maintainers’ defense strategies in Table 13. Among four strategies, regex repair is mostly used by maintainers, accounting for the highest proportion, as high as 92.19%, followed by code logic modification (5.13%). Also note that there are 8 outstanding vulnerable regexes (1.79%) whose defense actions have not been determined. To answer RQ2, we evaluated RegexScalpel on the 413 vulnerable regexes that were fixed by the maintainers using regex repair.

Table 14 shows that the repairs handcrafted by project maintainers can successfully defend 77.23% (319 / 413) vulnerable regexes as compared with 99.03% (409 / 413) by the ones generated using RegexScalpel. In the following, we discuss our observations from two perspectives, multiple vulnerabilities in one regex and SLQ -pattern regexes.

Multiple Vulnerabilities in One Regex. On examining the vulnerable regexes in the benchmarks, we observed there are vulnerable regexes that contain more than one vulnerability. We conducted further analysis on the number of such regexes that can be successfully repaired by the maintainers and by RegexScalpel, respectively. As shown in Table 15, there are 7 and 191 vulnerable regexes containing more than one vulnerability in the two benchmarks. Maintainers can only repair 57.58% of such regexes, with 14.29% and 59.16% success rates for the benchmarks, respectively. In comparison, RegexScalpel can successfully repair 97.98% of these vulnerable regexes automatically.

ReDoS-vulnerable Regexes with Pattern SLQ . Among the 413 vulnerable regexes that were repaired by maintainers, we made comparison over the ones with SLQ pattern. We found 318 such vulnerable regexes from both benchmarks. Table 16 shows the comparison results. Maintainers can successfully repair 72.22% and 76.33% of the regexes with SLQ in the two benchmarks, respectively. In comparison, RegexScalpel can achieve 100% and 99.33% in the two benchmarks, respectively. The results reveal that fixing regexes with SLQ pattern manually is challenging.

Summary to RQ2: Among the 413 repaired vulnerable regexes handcrafted by the maintainers, only 319 (77.23%) are ReDoS free. In RegexScalpel outperforms manual fixing, successfully repairs 409 (99.03%) of the 413 regexes. In particular, RegexScalpel excels at repairing regexes with multiple vulnerabilities and regexes with SLQ vulnerable pattern.

5.4 RQ3: Usefulness to Maintainers

We explored whether RegexScalpel can detect new ReDoS vulnerabilities in popular real-world projects and synthesize repairs of usefulness to project maintainers. We applied RegexScalpel to ten popular projects including Python and NLTK. The results are shown in Table 17. Although our methodology in this paper focuses on regex repair, the modules in RegexScalpel can also be applied to detecting ReDoS vulnerabilities. In total, RegexScalpel reported 16 new ReDoS regexes from ten projects. We applied RegexScalpel to repair these vulnerable regexes, and reported the repairs to the concerned project maintainers. All the 16 repairs were accepted by the maintainers and merged into subsequent project releases. At the time of submission, 8 CVEs concerning 8 projects and 13 of the 16 vulnerable regexes were assigned. The results demonstrate the usefulness of RegexScalpel to project maintainers in defending ReDoS attacks.

We take case #1 (i.e., the Python project) in Table 17 as an example to demonstrate the process of applying RegexScalpel in practice. The case concerns a vulnerable regex (?:.*,)*[ \t]*([^ \t]+)[ \t]+realm=(["\']?)([^"\']*)\2 used by Python built-in module urllib. The regex contains six vulnerabilities. Processing of the regex can be computationally expensive and is therefore vulnerable to ReDoS attacks. The Python maintainers replaced the pathological sub-regex (?:.*,)* with the safe one (?:^|,) to get the patch (?:^|,)[ \t]*([^ \t]+)[ \t]+realm=(["\']?)([^"\']*)\2. The patch fixed the first four vulnerabilities related to the pathological sub-regex (?:.*,)* in Table 18, and prevented the pathological sub-regex [ \t]* from being used as a starting sub-regex, which in turn fixed the fifth vulnerability. However, it cannot fix the sixth vulnerability, which is related to the pathological sub-regex [^ \t]+. RegexScalpel analyzed the vulnerability information of the patch (SLQ 3, (?:^|,)[ \t]*([^ \t]+), [,, ([^ \t]+)]). It leveraged the pattern τ38 in Table 8 to synthesize a repaired regex (?:^|,)[ \t]*([^ \t,]+)[ \t]+realm=(["\']?)([^"\' ]*)\2 based on the patch provided by the maintainers. In addition, RegexScalpel leveraged the repair patterns τ19 in Table 7 and τ32 in Table 8 to repair the initial regex and synthesize another patch ^(?:[^,]*,)*[ \t]*([^ \t]+)[ \t]+realm=(["\']?)([^"\']*)\2. The two synthesized patches RegexScalpel passed all test cases, and the first one is committed to the maintainers. Finally, the maintainers confirmed and released it in Python 3.6-3.10, and appreciated our help.

Summary to RQ3: RegexScalpel is useful to synthesize non-trivial and effective repairs for new vulnerable regexes found in popular real-world projects.

Page updated

Google Sites

Report abuse