Phishpedia: A Hybrid Deep Learning Based Approach to Visually Identify Phishing Webpages （appear in USENIX 21）
In this work, we propose an explainable phishing identification system, Phishpedia, which (1) achieves both high identification accuracy and low runtime overhead, (2) provides causal visual annotation on the phishing webpage screenshot, and (3) does not require training on any phishing samples Phishpedia infers the intended brand from the webpage screenshot of an URL, and reports phishing based on alignment of intended brand domain and the landing domain of the URL.
Phishpedia significantly outperforms baseline identification approaches (URLNet, StackModel, PhishCatcher, EMD, PhishZoo, and LogoSENSE) with respect to identification accuracy and runtime overhead. We deployed Phishpedia with emerging new domains fed from CertStream service and discovered 1704 phishing websites (including 1133 new zero-day phishing websites) within one month, significantly outperforming existing solutions.
Phishing Discovery Results
Each DATABASE folder comes with a readme.csv to facilitate the user from matching the phishing url with the folder path to open to screenshot (shot.png)
The database presents the results of phishing discovery experiment. The folder contains the found real phishing of EMD, PhishCatcher, Phishzoo, StackModel, URLNet, and Phishpeida.
For each tool, we show the html webpage, url name, and the screeshot of a webpage.
Please find the code for all baselines here: https://drive.google.com/drive/folders/1YpKR_Nye4E11FCbPbePAAJG4UcqkIsfZ?usp=sharing