MERLIN:

Parameter-Free Discovery of Arbitrary Length Anomalies in Massive Time Series Archives

Abstract

Time series anomaly detection remains a perennially important research topic. If anything, it is a task that has become increasingly important in the burgeoning age of IoT. While there are hundreds of anomaly detection methods in the literature, one definition, time series discords, has emerged as a competitive and popular choice for practitioners. Time series discords are subsequences of a time series that are maximally far away from their nearest neighbors. Perhaps the most attractive feature of discords is their simplicity, unlike many parameter laden methods, discords require only a single parameter to be set by the user, the subsequence length. In this work we argue that the utility of discords is reduced by sensitivity to this single user choice. The obvious solution to this problem, computing discords of all lengths, then selecting the best anomalies (under some measure) seems to be computationally untenable. However, in this work we introduce MERLIN, an algorithm that can efficiently and exactly find discords of all lengths in massive time series archives. We demonstrate the utility of our ideas on a large and diverse set of experiments and show that MERLIN can discover subtle anomalies that defy existing algorithms or even careful human inspection. Moreover, we show how to exploit computational redundancies to make MERLIN two orders of magnitude faster than comparable algorithms.