Introduction
We introduce our empirical study to answer the research question: "what is the performance of the state-of-the-art anti-phishing solutions on accessing CAPTCHA-cloaked websites?".
Hardening Framework: Cloaken
We designed a website-hardening framework, Cloaken, to automatically equip a website with its CAPTCHA-cloaking functionality. Given the source code of a website, Cloaken is designed to instrument the webpage with the code to generate a CAPTCHA challenge.
To collect experiment data, Cloaken generates unique, never-before-seen URLs (random UUID subdomain + fixed registered domain) that are automatically submitted to VirusTotal. Each URL points to an instance of a web page with nothing other than an instrumented CAPTCHA challenge. The URLs are also stored in a database to track the (URL, CAPTCHA type) mapping and the (URL, visited, solved) states. When the page is visited, a JavaScript event is emitted to the server. When the CAPTCHA is solved, a POST request is invoked to the server containing the visitor’s challenge token proving as such, which can be further verified by querying reCAPTCHA’s verification endpoint with this token. Our database records have shown that submitted URLs have the “visited” state but not the “solved” state.
VirusTotal provides an API to query the analysis report of each submitted URL. This report will list the verdict of each phishing detector, and the overall result (how many phishing/malicious/suspicious/clean/unrated verdicts). We can use this information to determine which crawlers have analyzed the URL.
Figure 1a: randomly generated URL with instrumented CAPTCHA challenge.
Figure 1b: VirusTotal analysis report of a submitted URL
Experiment Setup
Phishing Detectors. We select all the 90 industrial phishing detection engines included in VirusTotal and 2 academic phishing detectors, Phishpedia and PhishIntention, making a total of 92 phishing detector configurations.
CAPTCHA Types. We select four types of CAPTCHAs, i.e., reCAPTCHA v2, hCAPTCHA, GeeTest Slide, and Rotation in this study.
Evaluation. We call a pair of detector and CAPTCHA-type, (d, t), as a configuration where d represents a phishing detector and t represents a type of CAPTCHA. For each configuration, we generate k webpages with unique URLs, equip it with a random CAPTCHA of a type t, and submit it to d for analysis.