Inferring Phishing Intention via Webpage Appearance and Dynamics: A Deep Vision Based Approach




Introduction

In this work, we propose an explainable phishing identification system, PhishIntention, which (1) leverages abstract layout and logo semantics to better understand the intention of a website (brand intention + credential requiring intention). (2) combines static and dynamic approach to verify website credential requiring intention.

PhishIntention significantly outperforms baseline identification approaches (EMD, PhishZoo, VisualPhishNet, and Phishpedia) with respect to identification accuracy. We deployed PhishIntention with emerging new domains fed from CertStream service and discovered 1942 phishing websites (including 1368 new zero-day phishing websites) within two months, significantly outperforming existing solutions.

Framework overview

Phishing Discovery Results: Link to download

The database presents the discovered real phishing from phishing discovery experiment.
For
EMD, Phishzoo, and VisualPhishNet, we sample Top1K reported phishing and label the real phishing.
For
Phishpedia and PhishIntention, we have the full set of real phishing.
For each webpage, we show the HTML code, URL, and the screenshot of a webpage.

Baseline Approaches

Please find the code for all baselines here

Dataset

Please find the details here