baseline:URLnet

Step1:

Download the code for all baselines: https://drive.google.com/drive/folders/1YpKR_Nye4E11FCbPbePAAJG4UcqkIsfZ?usp=sharing
Navigate to URLnet
Prepare data in format csv/txt file, each line is one URL and its true label(+1 (Malicious) or -1 (Benign)): <URL label><tab><URL string>

For example:

+1 http://www.exampledomain.com/urlpath/...

-1 http://www.exampledomain.com/urlpath/...

Step2:

Please run

python test.py

--model.emb_mode 5

--data.data_dir <data_file_path>

--log.checkpoint_dir output_5/checkpoints/model-2430

--log.output_dir output.txt

--data.word_dict_dir output_5/words_dict.p

--data.char_dict_dir output_5/chars_dict.p

--data.subword_dict_dir output_5/subwords_dict.p

Step3:

The output is txt file, each line is one URL, it has 3 columns, score is the predicted probability for phishing and predict is the predicted class
For example:

label predict score

1 1 0.999985

1 1 0.99998844

1 1 0.67638105

** The code is forked from: https://github.com/Antimalweb/URLNet

Page updated

Google Sites

Report abuse