Complementary Experiment

Evaluating benign webpages with Google/Facebook/Linked in logos:

We additionally collected 131562 webpages from the CertStream (listed here, see 131562 Certstream urls), and 46 of them are webpages with Google/Facebook/LinkedIn Logos (4 of them are real phishing, download here).

Recall of the phishing discovery experiment:

  • We sampled 1489 URLs. Manual evaluation finds no phishing (download here: dataset_no_phishing).

  • Then, we use PhishCatcher to report 1489 URLs (download here: new_data_set_with_phishing), the performance of different baselines can be download here(See Evaluation of 1489 phishcatcher URLs).

Adversarial attack evaluation with gradient-recovering:

We use the BPDA tool (download here), and more detailed results are listed here(see Adversarial attack).

The experiment of perceptual hashing v.s. Siamese model:

The detailed results can be downloaded here(see Phishpedia experiments).

The experiment supportingEMD with more screenshots:

With first temporal half of Phish30K = 14748 phishing(download here) as targetlist, EMD are applied on the rest 14748 phishing and 30K benign dataset.

ROC comparison:

ROC is plotted with different threshold settings for different approaches. All experiments results can be found here.