A. Case Study in PSML
To compare with MADRID, we utilized the same training and testing data as MADRID, along with MADRID’s median search length of 4320 (3 days) as the window length for executing Telemanom. Experimental results revealed that Telemanom could not complete the model training within an acceptable timeframe. With a training dataset of 0.5 million, Telemanom took 49.6 hours to complete the first epoch. According to the original authors’ setup, a total of 35 epochs are required for complete model training, which leads us to anticipate that Telemanom would require a whopping 72.6 days to complete the modeling process. Due to the inability to complete the model training in a reasonable timeframe, we were unable to estimate the model’s testing duration. Moreover, with a testing dataset of a colossal 1 million, even if Telemanom were eventually to complete its modeling, there is no guarantee that it could finish testing within an acceptable timeframe and without any memory errors. Thus, it is evident that employing Telemanom on the PSML dataset is not feasible.
In stark contrast, MADRID only required 10.6 hours to conduct 14 searches of varying lengths on 1 million data points and reported four meaningful anomalies of different lengths. This effectively attests to MADRID’s efficiency and accuracy.
B. Case Study in EnerNOC-496
Figure X: Top) We searched a yearlong dataset with MADRID[80,360,1] (equivalent to 400 to 1,800 minutes), finding five distinct anomalies (center). Some of the discovered anomalies are very subtle (bottom).
We consider EnerNOC-496, a year-long energy usage dataset for commercial site [H]. The dataset is 105,409 datapoints long (sampled once every five minutes for the full year of 2012), of which we used the first 3,000 data points (10.4 days) as training data. As figure shows, MADRID[80,360,1] correctly finds the five distinct anomalies. Two of the anomalies are quite obvious, a dropout at 1/25 due to a sensor fault, and the business’s one closure a year which occurs on Xmas day. However, the remaining anomalies are very subtle, including a transition to opening an hour later on 4/23 and a transient voltage spike on 11/13.
This example illustrates the utility of multi-length anomaly search. Of the 280 values that MADRID searched, only four of them discovered the transient voltage spike.
Could we duplicate any of this success with state-of-the-art deep learning TSAD algorithms? We might expect this to be a challenging problem. We only have the first ten days of January as training data. That is enough to give us some examples of the dataset’s diversity, for example week days and weekend days. However, it is clear that there is some concept drift in this dataset, note the “bump” maximum demand that occurs in the summer months. The amplitude and offset of the subsequences are normalized by most TSAD algorithms (including MADRID), but there are almost certainly other subtle drifts in shape that might confuse algorithms that train on a fixed training set.
We again compare to Telemanom. We tested three values for m, 132 (a length that discovered the transient voltage spike) and two lengths to “bracket” that value, 100 and 164. However, we report only the best of these three lengths below. Moreover, Telemanom is stochastic, so we ran each length three times, and again reported only the best run. The results show that, in all nine runs, Telemanom only detected two obvious anomalies: the sensor failure that occurred on 1/25 and the closure that occured on Xmas day. In its best run, it used a length of 132 and reported four anomalies, of which two were true positives and two were false positives (the remaining runs reported the same two true positive but more false positives).
[H] Kayode S. Adewole, Vicenç Torra: DFTMicroagg: a dual-level anonymization algorithm for smart grid data. Int. J. Inf. Sec. 21(6): 1299-1321 (2022)
C. Derivation of Normalization Factor
Readers can find the derivation through the following resources.
100 Time Series Data Mining Questions (with Answers!) https://www.cs.ucr.edu/~eamonn/100_Time_Series_Data_Mining_Questions__with_Answers.pdf
Michele Linardi, Yan Zhu, Themis Palpanas, Eamonn J. Keogh: Matrix Profile X: VALMOD - Scalable Discovery of Variable-Length Motifs in Data Series. SIGMOD Conference 2018: 1053-1066