Anomaly detection for cyber security via machine learning

Introduction

Laymen explanation

A zero-day vulnerability is a weakness in a computer system that can be exploited by an attacker, and which is undetected by affected parties.To understand the impact, think what happens if your organisation is the victim of a zero-day exploit.

Detecting abnormal(anomalous) network behaviour can help organisation for attacks like Zero day attacks. If you are interested to know approaches, then this document can help.

Technical explanation

Anomaly detection, also called outlier detection, is the identification of unexpected events, observations, or items that differ significantly from the norm.

Anomalous data may be easy to identify because it breaks certain rules. If a sensor should never read 300 degrees Fahrenheit and the data shows the sensor reading 300 degrees Fahrenheit—there’s your anomaly. There is a clear threshold that has been broken.

Properties of anomaly

Anomalies in data occur only very rarely
The features of data anomalies are significantly different from those of normal instances

Examples of anomaly

- Network anomalies - intrusion detection forecasting

Application performance anomalies
Web application security anomalies - xss attack, DDOS attack, unexpected login attempt etc

Refer below time series graph which shows unexpected drop in network usage (anomalous behaviour)

Challenges in anomaly detection

Defining normal behaviour
Handling imbalanced distribution of normal and abnormal data
Sparse occurrence of abnormal events
Appropriate feature extraction
- Handling noise - Note that anomaly is different from noise
- Future anomaly may look nothing like any of the anomalous examples in the training set.

How to test if dataset has anomalous points

For supervised training, there should be data points which are labelled as anomalous
- For unsupervised learning, distances or cluster densities are used to give an estimation what is normal and what is an outlier. For example, an cluster with outliers will have very low density compared to clusters with normal points.

Feature selection criteria

Choose features which takes unusually high or very low in the case of anomaly event.

Impact of dimension reduction on anamoly detection

Dimension reduction smoothens the dataset and hence it removes the outliers. This is not good for anomaly detection. However, reconstruction original dimension reveals outliers. This is a popular technique for anomaly detection. Autoencoder model uses this technique.

Anomaly detection techniques

Autoencoder

Autoencoder is a popular model for this. Refer this paper for the detail.

Multi-variate Gaussian distribution

Gaussian fitting starts with a strong assumption on the distribution of your data, that it follows the normal, or Gaussian, distribution.

Below diagram shows z-score for different areas in the normal distribution

Real time anomaly detection

In communication networks, it is of interest to detect highly correlated traffic in a network for detecting anomalous behaviour like DDoS attack, zero day attack. It requires real-time processing for timely detection of anomalous events. Refer this paper for other examples of real time processing.

Autoencoder can be used for this purpose. Refer here for the example.

Welford’s method is another usable single-pass method for computing the running variance or the running standard deviation. Refer here for these models

In below picture, red points indicates anomaly based on previously seen data points.

Performance evaluation criteria

Due to class imbalance, accuracy will not be good evaluation metric. Confusion matrix is useful for this. Refer here for detail

Applications

Anomaly based network intrusion detection system
Credit card fraud detection
Malware detection
Anomaly detection in CICD pipeline. Refer here for the article and here for the paper
Software logs monitoring for identifying anomalous behaviour. Refer here for Splunk example

Refer colab example for zero day attack demo

More time series based anomaly detection examples via auto-encoder model are here (LSTM based anomaly detection for medical data) and here

Noise filtering

Without first removing the noise, the anomaly detection techniques are likely to give a large number of false positives. This paper talks about approach to remove noise before applying anomaly detection technique.

Reference

https://rhebo.com/en/company/news/post/incident-of-the-month-zero-day-exploit-detection/

https://avinetworks.com/glossary/anomaly-detection/

https://medium.com/datadriveninvestor/how-machine-learning-can-enable-anomaly-detection-eed9286c5306

https://arxiv.org/pdf/1906.04574.pdf

https://www.sciencedirect.com/topics/computer-science/outlier-detection

https://coursera.org/share/f4397bd495fef695fbc9e52dbc0c4a38

https://coursera.org/share/150df4339306d511d7bf1ca06e3b051f

https://sites.google.com/site/jbsakabffoi12449ujkn/home/machine-intelligence/role-of-confusion-matrix-in-machine-learning

https://sites.google.com/site/jbsakabffoi12449ujkn/home/machine-intelligence/handling-class-imbalance-in-machine-learning

https://coursera.org/share/d204c5137717231532f9bf5b3d90b52d

https://www.elementai.com/news/2019/modern-recipes-for-anomaly-detection

https://stats.stackexchange.com/questions/152644/what-algorithm-should-i-use-to-detect-anomalies-on-time-series

https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0152173

https://images.app.goo.gl/QeehHFe7V4CYCBpx7

https://images.app.goo.gl/CsxTHpm6ETVbAQER6

https://towardsdatascience.com/detecting-real-time-and-unsupervised-anomalies-in-streaming-data-a-starting-point-760a4bacbdf8

https://arxiv.org/pdf/1905.07107.pdf

https://docs.microsoft.com/en-us/azure/cognitive-services/anomaly-detector/concepts/anomaly-detection-best-practices

https://colab.research.google.com/drive/1_J2MrBSvsJfOcVmYAN2-WSp36BtsFZCa?authuser=1#scrollTo=saamYyUsHdw0

https://www.atlantis-press.com/journals/jrnal/125935236/view

https://keras.io/examples/timeseries/timeseries_anomaly_detection/

https://ieeexplore.ieee.org/document/9039599

https://images.app.goo.gl/DbVU265QktyWJZ7x6

https://itfeature.com/statistics/the-z-score-introduction-formula-real-life-example

https://youtu.be/XzEXB12N1xs

https://www.metricly.com/3-types-anomaly-detection-monitoring-tools/

https://arxiv.org/abs/1909.12682

https://docs.splunk.com/Documentation/SplunkCloud/8.1.2101/Search/Detectinganomalies