Datasets

We release datasets used in training and testing both the detection and recognition model

CAPTCHA Detection Dataset

Usage: Training and testing the CAPTCHA detection model
Contents: 19,680 webpage screenshots, with 10,680 of them having annotated CAPTCHA bounding boxes, and the remaining 9,000 without annotations (negative examples). Sourced from both the Alexa top-1 million websites and synthetic data generation.

Download Dataset

CAPTCHA Recognition Dataset

Usage: Training and testing the CAPTCHA recognition model.
Contents: 6,612 CAPTCHA images distributed across 38 classes. Sourced from scraping demo websites, using official API keys provided by vendors, and collecting datasets contributed by the community.

Download Dataset

CAPTCHA Open-set Dataset

Usage: Open-set testing on Phishdecloaker.
Contents: 1,500 webpage screenshots, all of which have annotated CAPTCHA classes spanning 15 unseen categories. Synthetically generated.

Download Dataset

Page updated

Google Sites

Report abuse