Contrast Profile
A Novel Time Series Primitive that Allows Classification in Real World Settings
A Novel Time Series Primitive that Allows Classification in Real World Settings
ABSTRACT
Time series data remains a perennially important datatype considered in data mining. In the last decade there has been an increasing realization that time series data can best understood by reasoning about time series subsequences on the basis of their similarity to other subsequences: the two most familiar time series concepts being motifs and discords. Time series motifs refer to two particularly close subsequences, whereas time series discords indicate subsequences that are far from their nearest neighbors. However, we argue that it can sometimes be useful to simultaneously reason about a subsequence’s closeness to certain data and its distance to other data. In this work we introduce a novel primitive called the Contrast Profile that allows us to efficiently compute such a definition in a principled way. As we will show, the Contrast Profile has many downstream uses, including anomaly detection, data exploration, and preprocessing unstructured data for classification. We demonstrate the utility of the Contrast Profile by showing how it allows end-to-end classification in datasets with tens of billions of datapoints.