C22MP
C22MP: The Combination of catch22 and the Matrix Profile is a Fast Efficient and Interpretable Anomaly Detector
The Matrix Profile is a data structure that annotates a time series by recording each subsequence’s Euclidean distance to its nearest neighbor. In recent years, the community has shown that by using the Matrix Profile it is possible to discover many interesting properties of the original time series, including repeated behaviors, anomalies, evolving patterns, regimes, etc. However, the Matrix Profile is limited to representing the relationship between the subsequence’s shapes. It is known that for some domains, useful information is conserved not in the subsequence’s shapes, but in the subsequence’s features. In recent years, an optimal set of features for time series called catch22 has revolutionized feature-based mining of time series. Combining these two ideas seems to offer many possibilities for novel data mining applications, however, there are two difficulties in attempting this. A direct application of the Matrix Profile with the catch22 features would be prohibitively slow. Less obviously, as we will demonstrate, in almost all domains, using all twenty-two of the catch22 features produces poor results, we must somehow select the subset appropriate for the domain. In this work, we introduce novel algorithms to solve both problems, and demonstrate that for most domains, the proposed C22MP is a state-of-the-art anomaly detector.
center) A trace from the right paw of a healthy mouse, with its companion C22MP. top and bottom) C22MP can accurately detect all type of anomalies
Dear Reviewer. In table 4, we give the ref for NORMA as [6], it should have been [7]. We will fix this.
This website includes all the codes, data, experiments and the figures generated for the paper "C22MP: The Combination of catch22 and the Matrix Profile is a Fast Efficient and Interpretable Anomaly Detector".
DataSet
Toy data (Figure 4, 5, 6, 7 )
50words (Figure 1, 8)
Melbourne (Figure 10, 11)
Gait (Figure 10, 11, A3)
WhiteFly (Figure 2, 3)
SWAT (Figure 12, A1)
20 Papers dataset (Figure 13)
Freezer (Figure 14, Figure 15)
Mouse Motion (Figure 16)
HEXagon ML/UCR (Table 4, Figure 17, 18)
CCT dataset (Table 5, Figure A5)
Code
C22MP Demo
ORR and Brute Force algorithm (Table 1, 2 , 3 and Figure 4, 5, 6, 7)
Experiment on 50word dataset - feature-based vs shape-base clustering (Figure 1, 8)
Experiment on WhiteFly dataset - Telemanom vs DAMP vs C22MP (Figure 2, 3)
SWAT dataset result (Figure 12)
20 Paper dataset results (Figure 13)
Experiment on Hexagon ML/UCR dataset (Table 4)
Experiment on Freezer dataset - Telemanom vs DAMP vs C22MP (Figure 14, 15)
Case Study on Medical data (Table 5 and Figure A5)
Case Study on Mouse Motion Capture (Figure 16)
Case Study with Human in the Loop (Figure 17, 18)
Timing Experiment (Figure 19)
Feature Search algorithms (Table A1, A2)
Human In the Loop Feature Search Tool (Figure 10, 11, A2)