Learning Objectives:
After completing this module, students will be able to:
Describe Polymorphic Malware
Describe the threat of AI on Cybersecurity through Polymorphic Malware
Identify the ways in which Generative AI could attack a neural network through polymorphic malware
Identify the solutions to Polymorphic Malware attacks
Use Google Colab to simulate the Polymorphic Malware attacks and solutions to them
Apply one of the algorithms defined in this module for independent study
What is polymorphic malware?
Polymorphic malware is a type of cybersecurity attack associated with malware (Figure 1). A typical "polymorphic attack follows this process:
The cybercriminal hides the malicious code via encryption, allowing it to bypass most traditional security tools.
The virus is installed on an endpoint and the infected file is downloaded and decrypted.
Once downloaded, a mutation engine creates a new decryption routine that is attached to the virus, making it appear to be a different file, and therefore unrecognizable to security tools -- even if an earlier version of the computer virus had been detected and placed on a blocklist."
(Crowdstrike 2022)
Figure 1: Polymorphic malware process (Xcitium)
How does Generative AI make it easier to implement polymorphic malware?
Generative AI can be used to automatically generate new variations of the malware with different code signatures, encryption methods, or file structures while retaining the malware's malicious behavior. Some models, like the Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs) can be trained to generate multiple variations of malware code. Each one may have a different structure while keeping the payload the same. This would prevent detection by antivirus software relying on static signatures. Revisit the Anomaly Detection Module for more information.
How are GANs used for polymorphic malware?
Since GANs consist of two nodes, a generator and a discriminator, each of them could be used to demonstrate polymorphic malware. The generator could create new malware variants, and the discriminator could try to detect them. Over time, as the generator becomes better at creating undetectable variants, the discriminator acts as a proxy for antivirus or IDS systems. So, the malware samples generated by GANs can continuously adapt to evade defenses, enhancing the polymorphic malware.
How are VAEs used for polymorphic malware?
VAEs could be used to encode the malware into a latent space and then decode it into different forms. This allows the malware to be represented in a compressed way while still generating unique variations from the latent space. This ensures that the malware's behavior stays consistent, but its appearance changes, making it harder for signature-based detection systems to work.
What are the solutions to detect polymorphic malware simulated by Generative AI?
So far, it has been common for antivirus or antimalware companies to implement signature-based detection systems. However, the problem for this traditional method is its flaw when it "cannot recognize the threat once it is re-encrypted" (Crowdstrike 2022). This is a crucial drawback since it allows for polymorphic virus to duplicate and spread without triggering the detection system.
However, signature-less malware protection could "determine the likelihood that a file is malicious by analyzing the broader picture and extracting so-called 'features' from the files analyzed" (Crowdstrike 2022). This means that algorithms like anomaly-based IDS could be used instead for detecting specialized polymorphic malware. Instead of looking for pre-defined signatures, the protection system would detect deviations in behavior.
Key Terms:
Anomaly-based IDS: This IDS detects attacks by learning what "normal" behavior looks like on the network and flagging anything that deviates from this normal behavior as suspicious. It is the same as anomaly detection.
Generative Adversial Networks (GANs): a type of AI model that consists of two parts: a generator and a discriminator. The generator tries to create fake data that looks like real data), and the discriminator tries to figure out if the data is real or fake.
Generative AI: A type of artificial intelligence that can generate new data that mimics the data it was trained on, often used in applications like content creation, data synthesis, and simulation.
IDS (Intrusion Detection System): A security system that monitors network traffic for suspicious activity and issues alerts when such activity is detected.
Polymorphic Malware: Malware that changes its code structure without altering its functionality, making it harder to detect with traditional security methods.
Signature-based IDS: This IDS detects attacks by comparing network traffic to a database of known attack patterns or "signatures."
Variational Autoencoders (VAEs): a model that compress data (like network traffic) into a simpler form (called a latent space) and then tries to recreate the original data from this compressed version.
References:
Baker, K. "What Is a Polymorphic Virus? Detection and Best Practices." July 22, 2022. crowdstrike.com. https://www.crowdstrike.com/cybersecurity-101/malware/polymorphic-virus/
Xcitium. "What Is a Polymorphic Virus and How to Prevent It?" https://www.xcitium.com/polymorphic-virus/