Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View that Includes Motifs, Discords and Shapelets
Chin-Chia Michael Yeh, Yan Zhu, Liudmila Ulanova, Nurjahan Begum, Yifei Ding, Hoang Anh Dau, Diego Furtado Silva, Abdullah Mueen, and Eamonn Keogh
Introduction
This page was built in support for the paper Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View that Includes Motifs, Discords and Shapelets. The resources contained in this page include demonstration videos, source code for the proposed STAMP algorithm, source code necessary for reproducing the figures and experiments, additional results, and supporting materials.
Demo Video
Anytime STAMP
The following video shows how STAMP's anytime property can help us quickly identify the motifs we shown in Figure 8.
STAMPI
The following video shows STAMPI in action. The time series used in the video is the same time series used for plotting Figure 14 and Figure 15 in paper.
Time Series Set Similarity
The following video demonstrates the utility of STAMP in terms of discovering conserved patterns between two time series. The example we shown in the video is mentioned in the paper's Section IV.D.
Source Code
STAMP
MATLAB general similarity join: code
MATLAB self-similarity join: code
MATLAB general similarity join (parallelized): code
MATLAB self-similarity join (parallelized): code
Figure
Figure 1: code
Figure 2 and 3: code
Figure 5: code
Figure 6 and 7: code
Figure 8: code
Figure 9 and 10: code
Figure 11: code
Figure 12: code
Figure 13: code
Figure 14 and 15: code
Figure 16: code
Figure 18 and 19: code
Table
Table 5 and 6: code
Additional Result
Additional Scalability Test Result
The table below shows STAMP's runtime difference for different "types" of time series length (i.e., power of two, non-power of two but composition number, and prime number.) The runtime on time series with power of two length is at least 14% faster than other types of length given that all the length are close to 131,272.
To demonstrate the relative runtime difference graphically, we have converted above table into the following plot.
The code for this set of experiments can be download here.
Additional Time Series Difference (TSD) Result
To further demonstrate the functionality of TSD operator defined in Section IV.D, we applied the method on a ECG telemetry data in addition to the Harry Potter example we included in the paper. We took 2.67 minutes long ECG data of an 89 year old female patient from the mitdb/207 dataset. We downsampled the data by a factor of 15 and treated the first half of the time series as TA and the second half as TB. The patient information is available here. We ran BA join on this data. From the PBA curve, the high values indicate instances present in TB but not in TA, i.e. the TSD operation. From the annotation of this data, we found two visually dissimilar occurrences of Ventricular escape beat (E) and left bundle branch block beat (L).