Complementary Experiment
Evaluating benign webpages with Google/Facebook/Linked in logos:
We additionally collected 131562 webpages from the CertStream (listed here, see 131562 Certstream urls), and 46 of them are webpages with Google/Facebook/LinkedIn Logos (4 of them are real phishing, download here).
Recall of the phishing discovery experiment:
We sampled 1489 URLs. Manual evaluation finds no phishing (download here: dataset_no_phishing).
Then, we use PhishCatcher to report 1489 URLs (download here: new_data_set_with_phishing), the performance of different baselines can be download here(See Evaluation of 1489 phishcatcher URLs).
Adversarial attack evaluation with gradient-recovering:
We use the BPDA tool (download here), and more detailed results are listed here(see Adversarial attack).
The experiment of perceptual hashing v.s. Siamese model:
The detailed results can be downloaded here(see Phishpedia experiments).
The experiment supportingEMD with more screenshots:
With first temporal half of Phish30K = 14748 phishing(download here) as targetlist, EMD are applied on the rest 14748 phishing and 30K benign dataset.
ROC comparison:
ROC is plotted with different threshold settings for different approaches. All experiments results can be found here.