Phishpedia

Phishpedia: A Hybrid Deep Learning Based Approach to Visually Identify Phishing Webpages （appear in USENIX 21）

In this work, we propose an explainable phishing identification system, Phishpedia, which (1) achieves both high identification accuracy and low runtime overhead, (2) provides causal visual annotation on the phishing webpage screenshot, and (3) does not require training on any phishing samples Phishpedia infers the intended brand from the webpage screenshot of an URL, and reports phishing based on alignment of intended brand domain and the landing domain of the URL.

Phishpedia significantly outperforms baseline identification approaches (URLNet, StackModel, PhishCatcher, EMD, PhishZoo, and LogoSENSE) with respect to identification accuracy and runtime overhead. We deployed Phishpedia with emerging new domains fed from CertStream service and discovered 1704 phishing websites (including 1133 new zero-day phishing websites) within one month, significantly outperforming existing solutions.

See our Github repository and Paper for details.

Overview

Input: A URL and its screenshot Output: Phish/Benign, Phishing target

Step 1: Enter Deep Object Detection Model, get predicted logos and inputs (inputs are not used for later prediction, just for explaination)
Step 2: Enter Deep Siamese Model
- If Siamese report no target, Return Benign, None
- Else Siamese report a target, Return Phish, Phishing target

Phishing example

Phishing Discovery Results

Each DATABASE folder comes with a readme.csv to facilitate the user from matching the phishing url with the folder path to open to screenshot (shot.png)

Link to download: https://drive.google.com/drive/folders/1X1xP0jiOfR7DcT3Mba-m0OdZ4gyowHhc?usp=sharing

Database ReadME

The database presents the results of phishing discovery experiment. The folder contains the found real phishing of EMD, PhishCatcher, Phishzoo, StackModel, URLNet, and Phishpeida.