Inferring Phishing Intention via Webpage Appearance and Dynamics: A Deep Vision Based Approach
Introduction
In this work, we propose an explainable phishing identification system, PhishIntention, which (1) leverages abstract layout and logo semantics to better understand the intention of a website (brand intention + credential requiring intention). (2) combines static and dynamic approach to verify website credential requiring intention.
PhishIntention significantly outperforms baseline identification approaches (EMD, PhishZoo, VisualPhishNet, and Phishpedia) with respect to identification accuracy. We deployed PhishIntention with emerging new domains fed from CertStream service and discovered 1942 phishing websites (including 1368 new zero-day phishing websites) within two months, significantly outperforming existing solutions.
Framework overview
The database presents the discovered real phishing from phishing discovery experiment.
For EMD, Phishzoo, and VisualPhishNet, we sample Top1K reported phishing and label the real phishing.
For Phishpedia and PhishIntention, we have the full set of real phishing.
For each webpage, we show the HTML code, URL, and the screenshot of a webpage.
Baseline Approaches
Please find the code for all baselines here
Dataset
Please find the details here