Report

Report.pdf

Q:

Hey Peter,

Really interesting work! looks like you did a thorough dive into cybersecurity and neural networks. Seems like it has a lot of potential. Well done!

A:

Thank you for your comment! Deep learning does have potential in network attack detection. It can learn from previous datasets and make predictions for unknown objects. I believe it will have a significant impact on cybersecurity.

Q:

Love report, I must commend your effort,

as the need for robust intrusion detection in cyber security can not be overemphasized.

A:

Thank you for your interest in my project! Intrusion detection and prevention are essential for today's network. New attacks and malware come out every year. I believe deep learning will significantly impact cybersecurity due to its strong prediction capability for unknown objects.

Q:

Hi Peter, as others have already stated, this is an impressive dive into both cybersecurity and machine learning. I’m honestly blown away by the level of effort and skill at display here. Hopefully you’re able to explore this further in the future and push innovation with your methodology.

A:

Thank you for your comment! Cybersecurity is really an important field, especially since we have too much important information transferred on the internet. I'd be glad to work further in the future.

Q:

any related work on these datasets with similar approaches for comparison? also the direction of packet or flow can have an impact on attacker and victim as well

A:

There are some related works. Beny Nugraha et al. used CNN, LSTM, CNN-LSTM, and MLP to build the models and train with CTU-13[1]. The result showed their accuracy on unknown botnet scenarios is 97.6%, 97.3%, 98.7%, and 88.5%, respectively. Yang Qin et al. designed a CNN-RNN model and trained it on CTU-13[2]. Results showed the accuracy is around 99.8%(slightly different due to different batch_size). Kapil Sinha et al. designed a model based on LSTM and evaluated it on CTU-13[3]. Results showed the accuracy reached around 96.2%. Xiaokang Zhou et al. proposed a new model named VLSTM(Variational LSTM), which is a compression network with a variational reparameterization[4]. They trained and tested the model with UNSW-NB15. The results showed the testing precision rate for VLSTM reached 86%. By comparison, the testing precision rate on that dataset with the ordinary LSTM is 80.8%. SU Yang used the Bi-LSTM network and trained and tested it with UNSW-NB15[5]. Results showed the average precision rate on all nine scenarios reached 93%. Sarah Aljbali used Bi-LSTM to build the model and trained and tested it on the UNSW-NB15 dataset[6]. Results showed the accuracy reached 99.7%, compared with 99.66% of that on ordinary LSTM network.


Most of these works got a higher accuracy or precision compared with my work. I found that many of them adjusted the structure of deep learning models rather than using the ordinary baseline. In addition, in their data processing part, many of them deleted or added some features to the dataset (mostly for CTU-13. These works didn’t adjust a lot on UNSW-NB15 since it’s already cleaned)


I used bi-directional netflow mostly to remain in the same format as CTU-13 did. According to the CTU-13 website, they believe bi-directional netflow has the following advantages over directional ones[7]:

  1. they solve the issue of differentiating between the client and the server

  2. they include more information

  3. they include much more detailed labels

However, the disadvantage is also obvious. It makes it harder to figure out the direction of netflows[8]. It is important to figure out the direction. According to Michael, direction indicates where the information is gathered, which has a significant impact on perceived accuracy[9]. Also, some people indicated that direction could help determine which host is the client or server[10]. It can also be used to assume which host is potentially infected with malware. Most datasets indicate direction by pointing out particular sources and destinations. Besides, CTU-13 has a column for direction. That column is mostly for pointing out whether the netflow is single or bidirection, but haven’t indicated the specific direction if a netflow is considered bidirectional. I removed that one from the training and testing dataset. Here is an example of the direction feature.

References:

[1] Nugraha B, Nambiar A, Bauschert T. Performance evaluation of botnet detection using deep learning techniques. In2020 11th International Conference on Network of the Future (NoF) 2020 Oct 12 (pp. 141-149). IEEE.

[2] Qin Y, Wei J, Yang W. Deep learning based anomaly detection scheme in software-defined networking. In2019 20th Asia-Pacific Network Operations and Management Symposium (APNOMS) 2019 Sep 18 (pp. 1-4). IEEE.

[3] Sinha, Kapil, Arun Viswanathan, and Julian Bunn. "Tracking temporal evolution of network activity for botnet detection." arXiv preprint arXiv:1908.03443 (2019).

[4] Zhou X, Hu Y, Liang W, Ma J, Jin Q. Variational LSTM enhanced anomaly detection for industrial big data. IEEE Transactions on Industrial Informatics. 2020 Sep 11;17(5):3469-77.

[5] Yang SU. Research on network behavior anomaly analysis based on bidirectional LSTM. In2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC) 2019 Mar 15 (pp. 798-802). IEEE.

[6] Aljbali S, Roy K. Anomaly detection using bidirectional LSTM. InProceedings of SAI Intelligent Systems Conference 2020 Sep 3 (pp. 612-619). Springer, Cham.

[7] Sebastian Garcia, Martin Grill, Jan Stiborek and Alejandro Zunino. The CTU-13 Dataset. A Labeled Dataset with Botnet, Normal and Background traffic. Stratosphere Lab. 2014. https://www.stratosphereips.org/datasets-ctu13

[8] Everything you didn’t want to know about Bidirectional and Unidirectional NetFlow. Plixer. 2010. https://www.plixer.com/blog/everything-you-didnt-want-to-know-about-bidirectional-and-unidirectional-netflow/

[9] Michael Patterson. IPFIX Flow Direction and Packet Counters. TMCnet. 2015. https://blog.tmcnet.com/advanced-netflow-traffic-analysis/2015/07/ipfix-flow-direction-and-packet-counters.html

[10] NetFlow Direction and it’s Value to Troubleshooting. Plixer. 2013. https://www.plixer.com/blog/netflow-direction-and-its-value-to-troubleshooting/