Learning Objectives:
After completing this module, students will be able to:
Describe Anomaly Detection
Describe the impact of Generative AI in Anomaly Detection
Identify the machine-learning algorithms that demonstrate Anomaly Detection
Use Google Colab to implement code segments to demonstrate the benefits of Anomaly Detection
Apply knowledge of a particular algorithm for further study
What is anomaly detection?
Anomaly detection is type of security assurance procedure. It "examines specific data points and detect rare occurrences that seem suspicious because they are different from the established pattern of behaviors" (AWS). In other words, it is a detection service to alert a software when there are unexpected events. This, in turn, protects the security of the system and privacy.
How does artificial intelligence benefit anomaly detection?
Generative AI could be particularly helpful with anomaly detection in network traffic. It could identify potential threats in anomalous patterns with the use of machine learning algorithms to analyze the traffic data in real-time.
What are the algorithms to implement anomaly detection in network traffic?
Recall in the Getting Started module, the basic example of a neural network was demonstrated in Google Colab. For this system to implement anomaly detection for network traffic, there are some algorithms that may work.
The Generative Adversial Network (GAN) and Variational Autoencoder (VAE) are two algorithms for generative AI to simultaneously learn, generate, and compare dataset samples. In the terms of network traffic, it would be comparing anomalous traffic data with a normal one.
What are GANs?
Generative Adversial Networks (GANs) are a type of algorithm where it generates novel data based on existing data. It also contains a discriminator that is "responsible to learn an efficient similarity metric so that it can differentiate between the real(s) and the fake(s)" (W&B).
With a neural network, GANs could be used to generate synthetic network traffic and abnormal examples. The discriminator in this situation would be working as an anomaly detector by identifying the "real" versus the "fake" generated data.
What are VAEs?
Variational Autoencoders (VAEs) are a type of algorithm where it "learns to encode the given input data and then reconstructs it from the encoding" (W&B). In other words, a sample dataset is given to the VAE. It would "compress" the data into a simpler version, store it in a latent space, add variation data to the compressed input, and then regenerate the original input.
In a neural network, the VAE is trained to compress and reconstruct normal traffic data. If, however, it encounters an anomalous network traffic, it would have difficulty in compressing and reconstructing it. The difference to normal data is compared and the error would be detected.
Can a VAE-GAN hybrid algorithm be used?
There are some ways in which a VAE-GAN hybrid algorithm may also be used for detecting anomalous data. The VAE would learn to encode and regenerate data; the GAN would have a generator that creates data and a discriminator that tries to distinguish between real and fake data (Figure 1).
With a network traffic example, the VAE learns to compress and reconstruct typical traffic patterns, while the GAN ensures that these reconstructions are look as "real" as possible. When the VAE-GAN hybrid encounters anomalous data, it would not reconstruct it well. Consequently, the GAN discriminator would identify the error as an anomaly.
Figure 1: Animated picture of the model flow with VAE-GAN (W&B)
Key Terms:
AI (Artificial Intelligence): The simulation of human intelligence processes by machines, particularly computer systems, including learning, reasoning, and self-correction.
Anomaly Detection: A technique used in cybersecurity to identify unusual patterns or behaviors within a dataset that may indicate a security threat.
Generative AI: A type of artificial intelligence that can generate new data that mimics the data it was trained on, often used in applications like content creation, data synthesis, and simulation.
Generative Adversial Networks (GANs): a type of AI model that consists of two parts: a generator and a discriminator. The generator tries to create fake data that looks like real data), and the discriminator tries to figure out if the data is real or fake.
Variational Autoencoders (VAEs): a model that compress data (like network traffic) into a simpler form (called a latent space) and then tries to recreate the original data from this compressed version.
VAE-GAN hybrid algorithm: a combination of a VAE and a GAN. It uses the VAE to compress and recreate data, while the GAN's discriminator checks if the recreated data looks realistic.
References:
"What is anomaly detection? - anomaly detection in ML explained." AWS. (n.d.). https://aws.amazon.com/what-is/anomaly-detection/
Mishra, S. "An introduction to VAE-gans." W&B. October 30, 2021. https://wandb.ai/shambhavicodes/vae-gan/reports/An-Introduction-to-VAE-GANs--VmlldzoxMTcxMjM5