Supplementary Material

Source Code

Start here, this one m-file, and the "how-to", is probably all you need.

DAMP (Version 2.0) : DAMP 2.0.m

Instructions for DAMP 2.0: DAMP_2.0 How To Use.pptx

Sample dataset: BourkeStreetMall.txt

The below are more advanced and experimental and will be folded into the main DAMP code in the future.

DAMP - topK : DAMP_topK.m

DAMP - fullMP : DAMP_fullMP.m

X-Lag-Amnesic DAMP: DAMP_X_Lag_Amnesic.m

Golden DAMP: DAMP_Golden.m

Multidimensional DAMP: DAMP_Multidim.m

Sample dataset :


T = load("test.txt");K = 5;tic; DAMP_topK(T,K); toc;tic; DAMP_fullMP(T); toc;tic; DAMP_X_Lag_Amnesic(T); toc;
% FOR GOLDEN DAMP% T is the time series to be tested and golden_batch is a long vector that contains all legal patternstic; DAMP_Golden(T,golden_batch); toc;
% FOR MULTIDIM DAMP% T is an N by D matrix, where N is the length of each time series and D is the number of dimensionstic; DAMP_Multidim(T,D); toc;

How to read an aMP generated by DAMP?

The time series and its corresponding Left-aMP are shown on the left. The red curve in the figure is the time series and the blue curve is the Left-aMP.

When you search for the top-k left-discords, the k highest peaks do correctly show the location and strength (the height of the peaks) of the top-k left-discords. However, the remaining peaks in the aMP should not be assumed to indicate slightly smaller anomalies. They may indicate slightly smaller anomalies, but they also simply indicate regions that were pruned by encountering a matching subsequence that was just below the current Best-So-Far.

It is easy to see from the blue curve in the figure that there are many nearly constant regions with a slight downward trend. These regions reflect the approximate discord scores of the pruned subsequences, and the downward trend in the aMP is designed to avoid the pruned subsequence having the same discord score as the real top discords. Therefore these nearly constant regions have no practical significance and there is no need to interpret them.

NOTE: The DAMP algorithm has a parameter called CurrentIndex, corresponding to spIndex in the paper, which specifies the initial position of the split point between the training and test data. Since the training data is not involved in any computation, in a Left MP generated by DAMP, the value before CurrentIndex is hard-coded to 0.

In the paper we claimed At least one-hundred papers have reported using discords to solve problems in diverse domains”. Here we will offer some evidence for this. Note that some papers that use discords actually attribute their success to the Matrix Profile or to HOTSAX, which are simply two algorithms to compute discords.

  • “we make use of matrix profiling to find the discords and identify..” Shahid et al AIOPS 2020

  • “correctly mapping new energy traces to one of the dominant usage classes, using a limited input set of examples associated to the energy time series discords” Nichiforov et al CASE 2020,

  • “observations of the magnetosphere collected by the Cassini spacecraft in orbit around Saturn.. this case, the best-performing method was discords.. Kiri L. Wagstaf et. al. NASA JPL. 2020.

  • Based on the concept of Matrix Profile ..without relying on time series synchronization.. the Railway Technologies Laboratory of Virginia Tech has been developing an automated onboard data analysis for the maintenance track system. Ahmadian et. al. JRC2019

  • (for) intrusion detection in industrial network traffic, distances as calculated with Matrix Profiles rises significantly during the attacks. a result, time series-based anomaly detection methods are capable of detecting deviations and anomalies. Schotten (2019).

  • (for an industrial IoT anomaly detect problem) Matrix Profiles perform well with almost no parameterisation needed. Anton et. al. ICDM 2018.

  • (paper title) Time Series Discord Detection in Medical Data using a Parallel Relational Database. BIBM 2015 Woodbridge et al.

  • RAMP builds upon an existing time series data analysis technique called Matrix Profile to detect anomalous distances...collected from scientific workflows in an online manner. Herath et. al. IEEE Big Data 2019

  • Based on obtained results for the considered data set, matrix profiles turned out to be most suitable for the task of anomaly detection Lohfink et al. VISSEC2019

  • (examining) manufacturing batches considering raw amperage (we found that the) Matrix Profile highlights anomalies Hillion & O'Connell of TIBCO Data Science. re:Invent 2019.

  • (our) approach is based on matrix profile, which is a method for time series analysis that is robust, domain-agnostic, and computationally efficient. Van Hoecke et al 2020

  • (we conducted) five cases, which demonstrate how the matrix profile based pattern recognition approach can be used in the power boiler environment. In general (this) resulted in promising outcomes. Liisa Nokelainen 2020.

  • Three anomalies were found near bridges and were indicated as low priority track issues. Two anomalies were... The Matrix Profiler found all these anomalies Steenwinckel et al 2020

  • Our solution will rely on Matrix Profile to allocate these discords in order to learn the anomalies. Alshaer et al.

  • We leverage the Matrix Profile (MP) to create a micro-service-based machinery monitoring solution Naskos et al 2021

  • SLMAD uses statistical-learning and employs a robust box-plot algorithm and Matrix Profile (MP) to detect anomalies. Team from Huawei/UCD.