Matrix Profile III: The Matrix Profile Allows Visualization of Salient Subsequences in Massive Time Series

Chin-Chia Michael Yeh, Helga Van Herle, and Eamonn Keogh

Introduction

This page was built in support for the paper Matrix Profile III: The Matrix Profile Allows Visualization of Salient Subsequences in Massive Time Series. The resources contained in this page include demonstration videos, source code for the proposed salient subsequences selection algorithm, source code and data necessary for reproducing the experiments, and additional results.

In this paper, all Matrix Profiles are computed with STAMP; however, there is a faster method to compute matrix profile (see STOMP).

Demo Video

In addition to the subsequence selection algorithm we presented in the paper, we also develop a tool which helps us navigate MDS plots. The following video showcases the utility of such tool. The code and data used in this video can be downloaded from [here].

Source Code, Data, and Additional Result

Section I

  • The code and data we used to generate Fig. 1 and Fig. 2 can be downloaded from [here].
  • In the paper, we have made the following statement: It is clearly not simply the random sampling of the ECG. Suppose we perform MDS on the famous Iris dataset [5]; then any large random sample will produce essentially the same MDS plot as the entire dataset. To support this statement, we have made MDS plots (shown below) using different randomly selected subsets of Iris dataset. All the MDS plot for the subsets looks similar to the MDS plot using the whole dataset. The code and data for generating following MDS plots can be downloaded from [here].
https://sites.google.com/site/salientsubs/home/iris.png?attredirects=0

Section II

  • The code we used to generate Fig. 3 can be downloaded from [here].
  • The code we used to generate Fig. 4 can be downloaded from [here].

Section III

  • The code we used to generate Fig. 5 and Fig. 6 can be downloaded from [here].
  • The time series shown in Fig. 7 can be downloaded from [here].

Section IV

  • The code we used to generate Fig. 8 can be downloaded from [here].
  • The code and data we used to generate Fig. 9 can be downloaded from [here].

Section V.A

  • An Excel spreadsheet which contains the experiment result for all dataset be downloaded from [here]. A guide for reading this Excel spreadsheet can be downloaded from [here].
  • The codes we used to generate the datasets for both whole sequence setting and subseuqnce setting from UCR archive can be downloaded from [here].
  • The datasets used for evaluating the subsequence selection algorithm with whole sequence setting can be downloaded from [here]. Each dataset is stored in binary MATLAB file format (i.e., MAT-file). There are 6 variables in each .mat file, and they are:
    1. data: the data (stored in an matrix with dimension of number of sequence x sequence length)
    2. dataName: name of the dataset
    3. lab: label for each sequence (-666 is the label for random walk, and other numbers are labels from UCR archive)
    4. matrixProfile: precomputed matrix profile (distance to the nearest sequence)
    5. profileIndex: precomputed matrix profile index (index of the nearest sequence)
    6. subLen: sequence length (m in paper)
  • The datasets used for evaluating the subsequence selection algorithm with subsequence setting can be downloaded from [here]. Each dataset is stored in binary MATLAB file format (i.e., MAT-file). There are 7 variables in each .mat file, and they are:
    1. data: the data (stored in a vector with dimension of n x 1)
    2. dataName: name of the dataset
    3. lab: label for each positive subsequence (class numbering is from UCR archive)
    4. labIdx: index for each positive subsequence
    5. matrixProfile: precomputed matrix profile (distance to the nearest subsequence)
    6. profileIndex: precomputed matrix profile index (position of the nearest subsequence)
    7. subLen: sequence length (m in paper)
  • The code which can reproduce the experiment results listed in above Excel spreadsheet can be downloaded from [here].
  • The code and data we used to generate Fig. 10 and Fig. 12 can be downloaded from [here].

Section V.B

  • The code and data we used to generate Fig. 13 can be downloaded from [here].

Section V.C

  • The code and data we used to generate Fig. 14 and Fig. 15 can be downloaded from [here].

Section V.D

  • The code and data we used to generate Fig. 16 can be downloaded from [here].

Section V.E

  • The code and data we used to generate Fig. 17 can be downloaded from [here].

Section V.F

  • The code and data we used to generate Fig. 18 can be downloaded from [here].