After completing this learning module, students will be able to:
Describe what is malware classification and protection
Explain Deep learning algorithm is useful for malware classification and protection.
Apply deep learning algorithms to analyze, classify, and detect a malware
Deep Learning-Based Malware Detection (DLMD)
Malware: Malware can be referred to as contraction of malicious programming codes, scripts, active content, or intrusive software that is designed to destroy intended computer systems and programs or mobile and web applications using different forms including computer viruses, worms, ransomware, rootkits, trojan, dialers, adware, spyware, keyloggers, or malicious Browser Helper Objects (BHOs). Any software purposefully designed for bad intention can be categorized as malware and it can be classified according to the purpose and method of propagation. A malware program can copy itself and infect a computer device without the permission or knowledge of the user, it can self-execute, if an infected file or a program is installed or shared with a new computer, the virus will automatically copy itself into the new computer and execute its code. Such infected files or programs come from other sources, the internet in general, downloading files from malicious websites, or clicking on a malicious link in particular.
Figure 1: illustrates malware classification
Figure 1 illustrates malware classification that was categorized in many domains. Malware detection and prevention are important to prevent unlawful, illegal, unauthorized attacks or access. The purpose of Malware detection is to protect the system from various kinds of malicious attacks by following the policy of detection and prevention. There are various existing algorithms to detect malware, however, with the advancement of malware technology, the adoption of Artificial Intelligence is crucial for efficient, and robust malware prevention. In this tutorial, we will learn Deep Learning-Based Malware Detection (DLMD) which includes Logistic Regression, Support Vector Machine, and Neural Network with 5-Folds Cross-Validation.
Deep Learning: Deep Learning is a technique of machine learning in AI and it is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years. Deep learning is one of the forefronts of Malware detection and deep learning is used as classifiers, trained through feature learning rather than task-specific algorithms. For practice purposes, we will use a neural network to develop the accuracy.
Figure 2: illustrates a comparison between Traditional Computer Vision Methods and Deep Learning
Previously, it was necessary to extract features to feed into the classifier within Traditional Computer Vision that gives less accuracy. On the other hand, deep learning gives robust accuracy because of its novel neural network techniques. If we consider Figure 2, the picture work as an input that extract into raw pixels and the model learns what features are important.
Figure 3: Deep Learning based Neural Network's Architecture
Deep learning utilizes a multilayer approach to the hidden layers of the neural network; features are learned and extracted automatically which helps to achieve robust accuracy and performance. A deep learning architecture is depicted in Figure 3 where it shows the sentiment polarity classification in deep learning. Deep learning can be divided into three core models, (i) A deep neural network (DNN), (ii) Convolutional Neural Networks (CNN) and (iii) Recurrent Neural Networks (RNN). A deep neural network (DNN) is a neural network with more than two layers including hidden layers depicted in Figure 4. Deep neural networks use sophisticated mathematical modeling to process data in many different ways. While A convolutional neural network (CNN) is a special type of feed-forward neural network originally employed in areas such as computer vision, recommender systems, and natural language processing. On the other hand, Recurrent neural networks (RNN) are a class of neural networks whose connections between neurons form a directed cycle, which creates feedback loops within the RNN. The main function of RNN is the processing of sequential information on the basis of the internal memory captured by the directed cycles.
The pictures represent some of the Malware that was converted into grayscale images so that we can easily run the Malware into Convolutional Neural Networks (CNN).
References:
[1] N. C. Dang, M. N. Moreno-García, and F. De la Prieta, “Sentiment analysis based on deep learning: A comparative study,” Electron., vol. 9, no. 3, 2020, doi: 10.3390/electronics9030483.
[2] https://developer.nvidia.com/blog/malware-detection-neural-networks/