Deep Learning for IDS/IPS

Proposal

The challenge facing cybersecurity has been growing significantly over the past few years. More sophisticated attacks emerge as novel technologies are adopted by the attackers. The existing defensive strategies are not sufficient to cope with these mutant attacks. This accordingly brings up the urgency of adopting more advanced, novel approaches to defensive systems. To this end, this project aims to explore some state-of-art technologies and try to exploit their advantages.


The existing defensive strategies are still widely used. These solutions include Network Intrusion Detection and Prevention Systems(IDS/IPS), which themselves can be categorized as Signature-based IDS and Behavior-based IDS. Signature-based IDS detects malware by matching known signatures. Behavior-based IDS first learns what is normal behavior, and it then reports any abnormal once triggered. With the growing complexity of networks and applications, these methods become inefficient.

Machine learning has been vastly developed and applied in the past few years, especially in Cybersecurity. Now with computing power becoming more accessible than ever, machine learning methods can be used to analyze and detect attacks. A major advantage that comes in with machine learning methods is that it supports near real-time analysis, as a result eliminating the drawbacks of traditional methods.


Delplace et al [1] compared several machine learning methods on the CTU-13 dataset [2]. CTU-13 contains 13 captures of different botnet samples. The result shows that Random Forest performs the best among all those algorithms. The detection accuracy was more than 95% in 8 out of 13 scenarios. However, the authors believed that it is hard to increase accuracy in the other 5 scenarios. I will try to use more complex algorithms to see if I can increase accuracy in these 5 scenarios. First, I will try to tune the hyperparameters of Random Forest to see if there is any improvement. Then, I will try to use neural networks such as Recurrent Neural Networks (RNN) [3], Long Short-Term Memory (LSTM), Bidirectional Long Short-Term Memory (BLSTM) [4], etc. I think neural networks will perform better in botnet detection as it is more appropriate to handle relatively large datasets.


This project is expected to come out with a program written in Python with different types of algorithms. I will use CTU-13 as the dataset as it contains several types of cyber attacks. I will use 2/3 of the dataset to train the models and 1/3 of the dataset to run the tests. I will summarize the result in a table to compare the performance of different algorithms.

From Oct 9-Oct 22, I will read more papers on cybersecurity and machine learning&deep learning, and I will design the structure of the models. From Oct 23-Nov 5, I will do the programming. From Nov 6-Nov 19, I will train the models and do adjustments. From Nov 20-Dec 9, I will run the tests and finish the report.


Resources&References:

[1] Delplace A, Hermoso S, Anandita K. Cyber Attack Detection thanks to Machine Learning Algorithms. arXiv preprint arXiv:2001.06309. 2020 Jan 17.

[2] The CTU-13 Dataset. A Labeled Dataset with Botnet, Normal and Background traffic. — Stratosphere IPS

[3] Torres P, Catania C, Garcia S, Garino CG. An analysis of recurrent neural networks for botnet detection behavior. In2016 IEEE biennial congress of Argentina (ARGENCON) 2016 Jun 15 (pp. 1-6). IEEE.

[4] McDermott CD, Majdani F, Petrovski AV. Botnet detection in the internet of things using deep learning approaches. In2018 international joint conference on neural networks (IJCNN) 2018 Jul 8 (pp. 1-8). IEEE.