Finding and Visualizing Time Series Motifs of All Lengths Using the Matrix Profile

Scalable KInetoscopic Matrix Profile (SKIMP) is a family of algorithms which compute the Pan Matrix Profile (PMP), a new data structure which contains the nearest neighbor information for all subsequences of all lengths. This data structure allows the first truly parameter-free motif discovery algorithm in the literature.

In exploratory data mining, we may have no idea as to the subsequence lengths in which patterns are conserved in a dataset necessitating the need for variable-length motif discovery.

This very basic problem is ubiquitous in nearly all domains as the user's choice limits what regularities can be found in the dataset.

Figure: A toy data set of a random walk with two embedded subsequences (red and green).

In many cases, the suggested subsequence length for motif discovery is not readily apparent. This problem is exacerbated if a time series has multiple motifs of different lengths.

Figure: A frame-by-frame animation of the calculation of a pan matrix profile. Each frame corresponds to the completion of another matrix profile.

In this work we solve the motif-length sensitivity problem by introducing the Pan Matrix Profile, a data structure that contains all Matrix Profile information of a time series, and SKIMP, a family of parameter-less, anytime algorithms used to quickly approximate SKIMP.