In this work, we propose an explainable phishing identification system, Phishpedia, which (1) achieves both high identification accuracy and low runtime overhead, (2) provides causal visual annotation on the phishing webpage screenshot, and (3) does not require training on any phishing samples Phishpedia infers the intended brand from the webpage screenshot of an URL, and reports phishing based on alignment of intended brand domain and the landing domain of the URL.
Phishpedia significantly outperforms baseline identification approaches (URLNet, StackModel, PhishCatcher, EMD, PhishZoo, and LogoSENSE) with respect to identification accuracy and runtime overhead. We deployed Phishpedia with emerging new domains fed from CertStream service and discovered 1704 phishing websites (including 1133 new zero-day phishing websites) within one month, significantly outperforming existing solutions.
See our Github repository and Paper for details.
Input: A URL and its screenshot Output: Phish/Benign, Phishing target
Step 1: Enter Deep Object Detection Model, get predicted logos and inputs (inputs are not used for later prediction, just for explaination)
Step 2: Enter Deep Siamese Model
If Siamese report no target, Return Benign, None
Else Siamese report a target, Return Phish, Phishing target
Each DATABASE folder comes with a readme.csv to facilitate the user from matching the phishing url with the folder path to open to screenshot (shot.png)
Link to download: https://drive.google.com/drive/folders/1X1xP0jiOfR7DcT3Mba-m0OdZ4gyowHhc?usp=sharing
The database presents the results of phishing discovery experiment. The folder contains the found real phishing of EMD, PhishCatcher, Phishzoo, StackModel, URLNet, and Phishpeida.
For each tool, we show the html webpage, url name, and the screeshot of a webpage.
We list a few found phishing webpages by Phishpedia here.
Please find the code for all baselines here: https://drive.google.com/drive/folders/1YpKR_Nye4E11FCbPbePAAJG4UcqkIsfZ?usp=sharing
EMD (general experiment, phishing discovery experiment)
Phishzoo (general experiment, phishing discovery experiment)
LogoSENSE (general experiment, phishing discovery experiment)
StackModel (phishing discovery experiment)
URLNet (phishing discovery experiment)
181 protected brands, Link to download: https://drive.google.com/file/d/1zxvXFKpLx816VfaGFISL6tod-zSEc6hY/view?usp=sharing
29496 phishing sites, Link to download: https://drive.google.com/file/d/12ypEMPRQ43zGRqHGut0Esq2z5en0DH4g/view?usp=sharing
Link to download: https://drive.google.com/file/d/1EJnx9oX9wQieF7UPQJeTVg850nZsuxTi/view?usp=sharing
30649 benign sites, Link to download: https://drive.google.com/file/d/1yORUeSrF5vGcgxYrsCoqXcpOUHt-iHq_/view?usp=sharing
30649 benign dataset with ground-truth logo labels, Link to download:
https://drive.google.com/file/d/1yORUeSrF5vGcgxYrsCoqXcpOUHt-iHq_/view?usp=sharing
https://drive.google.com/file/d/1bH3Yp6K1B37B_sS_MNMz7yvYcOhOu-J8/view?usp=sharing
https://drive.google.com/file/d/1u56I0IHBgM9glNJl2wcLfaihp1L_U7eD/view?usp=sharing