Step1:
Download the code for all baselines: https://drive.google.com/drive/folders/1YpKR_Nye4E11FCbPbePAAJG4UcqkIsfZ?usp=sharing
Navigate to URLnet
Prepare data in format csv/txt file, each line is one URL and its true label(+1 (Malicious) or -1 (Benign)): <URL label><tab><URL string>
For example:
+1 http://www.exampledomain.com/urlpath/...
-1 http://www.exampledomain.com/urlpath/...
Step2:
Please run
python test.py
--model.emb_mode 5
--data.data_dir <data_file_path>
--log.checkpoint_dir output_5/checkpoints/model-2430
--log.output_dir output.txt
--data.word_dict_dir output_5/words_dict.p
--data.char_dict_dir output_5/chars_dict.p
--data.subword_dict_dir output_5/subwords_dict.p
Step3:
The output is txt file, each line is one URL, it has 3 columns, score is the predicted probability for phishing and predict is the predicted class
For example:
label predict score
1 1 0.999985
1 1 0.99998844
1 1 0.67638105
** The code is forked from: https://github.com/Antimalweb/URLNet