Our group demonstrated the uses of generative AI in cyber security via a TensorFlow learning model, Keras, Transformers, Pandas, and NumPy libraries. The generative model that was used is GPT-2 to aid in synthetic data generation. In the phishing email generation demonstration, GPT-2 imported TensorFlow’s training from a dataset containing legitimate and phishing emails which was utilized to craft realistic synthetic emails. This involved developing prompts that simulate authentic phishing contexts, generating multiple email examples, and subsequently evaluating their authenticity and potential effectiveness against standard cybersecurity detection systems.
Fig. 1. Training Accuracy (Blue Line) - Correctness of the LM identifying real or phishing emails through the same data set for each iteration.
Validation Accuracy (Orange Line) - Correctness of the LM identifying real or phishing emails it has never seen before (testing set).
The comparison of training and validation accuracy across four epochs with the validation accuracy remains consistently high, suggesting effective generalization and minimal overfitting.
The comparison of training and validation accuracy across four epochs with the validation accuracy remains consistently high, suggesting effective generalization and minimal overfitting, showcasing AI potential in the detection of phishing/scam emails. The graph displays how the model can correctly identify phishing and scam emails with an accuracy of 98% and increases as it passes through more epochs. Due to Google Colab’s limited CPU and servers we could only showcase a limited number of iterations.
In the phishing email generation component (Fig 2), the GPT-2 model successfully produced highly realistic emails with detailed contextual and personalized information. These synthetic emails underscored GPT-2’s potential for crafting targeted phishing campaigns capable of circumventing traditional email filters, highlighting the growing threat associated with AI-driven attacks.