Martingale Model

The Martingale Model proposed change detection methodology is a consequence of Doob's Maximal Inequality for martingale. The martingale itself is created by the p-values and epsilon of the current data, and checked against the last power martingale to decide whether a change is indicated in the data. It is a very efficient and fast method on conducting tests in data streams.

Finding the Strangeness

Initially the strangeness is calculated though various methods either with SVM, KMeans Clustering or KNN. These 3 methods have different variations to find how different the current point is from the norm of the data. Below is an example using KMeans clustering (red graph).

Each datapoint in the strangeness algorithm is checked against previous points, and in clustering, by a distance from the cluster center. I will focus on KMeans here as SVM and KNN both took a larger amount of time to calculate results, especially as the data set observed grew passed 1000 points. We can see from the graphs above that the real data set and the strangeness measure looks similar but the domain values have changed, as in the strangeness measure it is the euclidean distance to the center of the Kmeans algorithm.

Calculating P-Values

From here we calculate the P-Values with the formula shown below. Simplified it is the count of strangeness points greater then the current points, in addition to points the same as the current points, over all data points processed so far.

Generally most data sets I test the P-values would start off very high and quickly (at around 200 data points) shrink to stable confidence level. Seen below in Data set 2, is the P-values and the real data set, which has anomalies at the very end of the data set.

Creating the Martingale

The Power Martingale is constructed from the below formula, and epsilon's value being between 0.8 ~ 1. This was stated by the author, as the martingale would fail to detect change out of that range.

Google Sites

Report abuse