A novel approach for phishing detection based on the automatic identification of persuasion principles used in malicious messages
Investigation project report
Bustio-Martínez, L., Herrera-Semenets, V., González-Ordiano, J. A., Pérez-Guadarrama, Y., Zúñiga-Morales, L. N., Montoya-Godínez, D., Álvarez-Carmona, M. Á., & van den Berg, J.
Universidad Iberoamericana Ciudad de México, Advanced Technologies Application Center (CENATAV), Cuba, Center for Research in Mathematics (CIMAT), Monterrey Campus, Nuevo León, Mexico, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, The Netherlands
This research explores the use of message subjectivity for detecting phishing attacks by identifying principles of persuasion (PoP) used in malicious messages. It assesses the impact of various data representations and classifiers on automatically identifying these principles and investigates how they can be leveraged for phishing detection. The study emphasizes the need for user-friendly and comprehensible models, and it finds that tree-based models, particularly Random Forest, are preferred due to their effectiveness and clarity.
This work was supported by the “Instituto de Investigación Aplicada y Tecnología” (InIAT) and the “Universidad Iberoamericana, Ciudad de México” (IBERO). Additionally, the authors thank CONAHCYT for the computer resources provided through the INAOE Supercomputing Laboratory’s Deep Learning Platform for Language Technologies. The web of the project in the InIAT can be found here.
Bustio-Martínez, L., Herrera-Semenets, V., González-Ordiano, J. A., Pérez-Guadarrama, Y., Zúñiga-Morales, L. N., Montoya-Godínez, D., Álvarez-Carmona, M. Á., & van den Berg, J. (2025). Enhanced phishing detection using multimodal data. Manuscript submitted to Knowledge-Based Systems. In review, 2nd round.
Rodríguez Díaz, A., Herrara Sements, V., Hernández Sierra, G., Reyes Díaz, F. J., & Bustio Martínez, L. (2025). Detección de phishing en comunicaciones de voz utilizando aprendizaje automático. En SIGESTIC 2025: V Encuentro sobre Sistemas de Gestión para las Tecnologías de la Información y la Comunicación. Varadero, Cuba.
Herrera-Semenets, V., Bustio-Martínez, L., Pérez-Guadarramas, Y., González-Ordiano, J. Á., & van den Berg, J. (2024). Unmasking Phishing Attempts: A Study on Detection in Spanish Emails. In: Hernández-García, R., Barrientos, R.J., Velastin, S.A. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2024. Lecture Notes in Computer Science, vol 15369. Springer, Cham. https://doi.org/10.1007/978-3-031-76604-6_1
Bustio-Martínez, L., Herrera-Semenets, V., García-Mendoza, J.L., Álvarez-Carmona, M.A., González-Ordiano, J.A., Zúñiga-Morales, L. Quiróz-Ibarra, J.E., Santander-Molina, P.A., Van den Berg, J. (2024) Uncovering phishing attacks using principles of persuasion analysis, Journal of Network and Computer Applications, 2024, 103964, ISSN 1084-8045, https://doi.org/10.1016/j.jnca.2024.103964.
Bustio-Martínez, L., Herrera-Semenets, V., González Ordiano, J. A., & Álvarez Carmona, M. Á. (2024). Detección de ataques phishing mediante técnicas de Inteligencia Artificial. Komputer Sapiens, 3 (septiembre-diciembre). Recuperado de http://komputersapiens.smia.mx/publicaciones.php#KSXVI-III
Bustio Martínez, L., Herrera-Semenets, V., Álvarez-Carmona, M. A., & González-Ordiano, J. A. (2024). La Inteligencia Artificial en la Ciberseguridad. ReinvenTec. Revista de Ciencia y Tecnología del ITTLA, 1(2). Publicado en marzo de 2024.
Bustio-Martínez, L. et al. (2023). Towards Automatic Principles of Persuasion Detection Using Machine Learning Approach. In: Hernández Heredia, Y., Milián Núñez, V., Ruiz Shulcloper, J. (eds) Progress in Artificial Intelligence and Pattern Recognition. IWAIPR 2023. Lecture Notes in Computer Science, vol 14335. Springer, Cham. https://doi.org/10.1007/978-3-031-49552-6_14
Bustio-Martínez, L., Álvarez-Carmona, M. A., Herrera-Semenets, V., Feregrino-Uribe, C., & Cumplido, R. (2022). A Lightweight Data Representation for Phishing URLs Detection in IoT Environments. Information Sciences, 603, 42-59. https://doi.org/10.1016/j.ins.2022.04.059
This research proposes a novel approach for phishing detection based on identifying principles of persuasion (PoP) in malicious messages.
It explores the impact of different data representations and machine learning classifiers on automatically identifying PoP.
The study finds that there is no one-size-fits-all solution for data representation and classifier selection, and a tailored combination is needed for each principle.
Machine learning models are created to automatically detect PoP with confidence levels ranging from 0.7306 to 0.8191 for AUC-ROC.
The research emphasizes the importance of user-friendly and interpretable models for end-users.
Tree-based models, particularly Random Forest, are highlighted as the preferred option due to their effectiveness and clarity, achieving an AUC-ROC of 0.859842.