A zero-day vulnerability is a weakness in a computer system that can be exploited by an attacker, and which is undetected by affected parties.To understand the impact, think what happens if your organisation is the victim of a zero-day exploit.
Detecting abnormal(anomalous) network behaviour can help organisation for attacks like Zero day attacks. If you are interested to know approaches, then this document can help.
Anomaly detection, also called outlier detection, is the identification of unexpected events, observations, or items that differ significantly from the norm.
Anomalous data may be easy to identify because it breaks certain rules. If a sensor should never read 300 degrees Fahrenheit and the data shows the sensor reading 300 degrees Fahrenheit—there’s your anomaly. There is a clear threshold that has been broken.
Anomalies in data occur only very rarely
The features of data anomalies are significantly different from those of normal instances
Network anomalies - intrusion detection forecasting
Application performance anomalies
Web application security anomalies - xss attack, DDOS attack, unexpected login attempt etc
Refer below time series graph which shows unexpected drop in network usage (anomalous behaviour)
Defining normal behaviour
Handling imbalanced distribution of normal and abnormal data
Sparse occurrence of abnormal events
Appropriate feature extraction
Handling noise - Note that anomaly is different from noise
Future anomaly may look nothing like any of the anomalous examples in the training set.
For supervised training, there should be data points which are labelled as anomalous
For unsupervised learning, distances or cluster densities are used to give an estimation what is normal and what is an outlier. For example, an cluster with outliers will have very low density compared to clusters with normal points.
Choose features which takes unusually high or very low in the case of anomaly event.
Dimension reduction smoothens the dataset and hence it removes the outliers. This is not good for anomaly detection. However, reconstruction original dimension reveals outliers. This is a popular technique for anomaly detection. Autoencoder model uses this technique.
Autoencoder is a popular model for this. Refer this paper for the detail.
Gaussian fitting starts with a strong assumption on the distribution of your data, that it follows the normal, or Gaussian, distribution.
Below diagram shows z-score for different areas in the normal distribution
In communication networks, it is of interest to detect highly correlated traffic in a network for detecting anomalous behaviour like DDoS attack, zero day attack. It requires real-time processing for timely detection of anomalous events. Refer this paper for other examples of real time processing.
Autoencoder can be used for this purpose. Refer here for the example.
Welford’s method is another usable single-pass method for computing the running variance or the running standard deviation. Refer here for these models
In below picture, red points indicates anomaly based on previously seen data points.
Due to class imbalance, accuracy will not be good evaluation metric. Confusion matrix is useful for this. Refer here for detail
Anomaly based network intrusion detection system
Credit card fraud detection
Malware detection
Anomaly detection in CICD pipeline. Refer here for the article and here for the paper
Software logs monitoring for identifying anomalous behaviour. Refer here for Splunk example
Refer colab example for zero day attack demo
More time series based anomaly detection examples via auto-encoder model are here (LSTM based anomaly detection for medical data) and here
Without first removing the noise, the anomaly detection techniques are likely to give a large number of false positives. This paper talks about approach to remove noise before applying anomaly detection technique.
https://rhebo.com/en/company/news/post/incident-of-the-month-zero-day-exploit-detection/
https://avinetworks.com/glossary/anomaly-detection/
https://medium.com/datadriveninvestor/how-machine-learning-can-enable-anomaly-detection-eed9286c5306
https://arxiv.org/pdf/1906.04574.pdf
https://www.sciencedirect.com/topics/computer-science/outlier-detection
https://coursera.org/share/f4397bd495fef695fbc9e52dbc0c4a38
https://coursera.org/share/150df4339306d511d7bf1ca06e3b051f
https://sites.google.com/site/jbsakabffoi12449ujkn/home/machine-intelligence/role-of-confusion-matrix-in-machine-learning
https://sites.google.com/site/jbsakabffoi12449ujkn/home/machine-intelligence/handling-class-imbalance-in-machine-learning
https://coursera.org/share/d204c5137717231532f9bf5b3d90b52d
https://www.elementai.com/news/2019/modern-recipes-for-anomaly-detection
https://stats.stackexchange.com/questions/152644/what-algorithm-should-i-use-to-detect-anomalies-on-time-series
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0152173
https://images.app.goo.gl/QeehHFe7V4CYCBpx7
https://images.app.goo.gl/CsxTHpm6ETVbAQER6
https://towardsdatascience.com/detecting-real-time-and-unsupervised-anomalies-in-streaming-data-a-starting-point-760a4bacbdf8
https://arxiv.org/pdf/1905.07107.pdf
https://docs.microsoft.com/en-us/azure/cognitive-services/anomaly-detector/concepts/anomaly-detection-best-practices
https://colab.research.google.com/drive/1_J2MrBSvsJfOcVmYAN2-WSp36BtsFZCa?authuser=1#scrollTo=saamYyUsHdw0
https://www.atlantis-press.com/journals/jrnal/125935236/view
https://keras.io/examples/timeseries/timeseries_anomaly_detection/
https://ieeexplore.ieee.org/document/9039599
https://images.app.goo.gl/DbVU265QktyWJZ7x6
https://itfeature.com/statistics/the-z-score-introduction-formula-real-life-example
https://youtu.be/XzEXB12N1xs
https://www.metricly.com/3-types-anomaly-detection-monitoring-tools/
https://arxiv.org/abs/1909.12682
https://docs.splunk.com/Documentation/SplunkCloud/8.1.2101/Search/Detectinganomalies