Reproducibility

As we noted in the main text, to make certain that our experiments are reproducible, we have built a website [6] that contains all the data/code used in this work. Here we take advantage of the two pages available to highlight some of our reproducibility steps.

Some experiments (Section 6.4 and 6.6) make use of random numbers. We have provided the seeds and code to all users to reproduce all such data bit-for-bit.

In some places we omitted a discussion of the parameter m. This was done to enhance the flow of the paper. The reader will recall that m is the only parameter that affects the output of the algorithm (lookahead affects only the speed). We repair this omission here:

· Figure 2 (Bearing) m was 300. However, the results would be near identical for m in the range 100 to 500 [6].

· Figure 3 (ECG_{Wandering Baseline}) m was 150. However, the results would be near identical for m in the range 100 to 300.

· Figure 4 (Mackey-Glass) m was 40. However, the results would be near identical for m in the range 20 to 200 [6].

· Figure 5 (ECG), m was 150. However, the results would be near identical for m in the range 100 to 300 [6].

· Figure 9 (Energy Grid Dataset) m was 5760 (equivalent to four days of wall clock time). However, we can easily find the “Joshua” anomaly with m in the range 100 to 10,000 [6].

· Figure 10 (Machining) m was 16. However, the results would be near identical for m in the range 8 to 64 [6].

· Figure 11 (Random Walks) m was 1,024. Here we had to carefully tune the length on the embedded anomaly so that we did not get a perfect result each time.

· Figure 12 (Long ECG) m was 94 [6].

· Figure 13 (Long ECG) m was 94 [6].

· Figure 14 (Billion Length Random Walk) m was 128 [6].

Notes on Section 6.3: In Section 6.3 we suggested how long it would take deep learning to solve the task we considered in that section. We found that for Telemanom [3] training time is linear to the time series length with R²= 0.9933. Unfortunately, it runs out of memory on this task, but we trained it to n = 80,000 which took 3656.5 seconds. This suggests one million datapoints would take 12.7 hours to train, but recall we trained on 200 such examples, so the total training time would be about 105.8 days.

For testing Telemanom is also linear. We found that when processing the bearing dataset, which is of length 244,189, testing took 700.4 seconds, suggest a throughput of about 348.6 Hz. This suggests it would take about 11.6 years to process the 128 billion datapoints (of course, this could be done in parallel). The timing experiments for Telemanom can be found at [6].

The training time here is the biggest hurdle. The is a qualitative difference between a model you can train during a coffee break, and one that requires three months.

Notes on Section 6.4: In Section 6.4, we noted that we “inserted a subtle anomaly, a low amplitude random section of length 950.” We choose the odd length of 950, because we found that if we made the anomaly the same length at m (1024), the accuracy on the training set was 100%. We wanted to stress test our algorithm and have an experiment that others could improve upon.

Notes on Table 4 (Table 9 of the journal paper): As we noted in the main text, the results of DAMP shown in Table 4 do not require any human effort. We use the following four lines of Matlab code to automatically learn the period for each data set and use it as the parameter m for DAMP.

[autocor,lags] = xcorr(T,'coeff');

[~,m] = findpeaks(autocor(length(T)+10:length(T)+1000),…

lags(length(T)+10:length(T)+1000),'SortStr','descend','NPeaks',1);

m(isempty(m))=1000;

m = floor(m);

The period is obtained by finding the peak of autocorrelation in the range of 10 to 1000 (the value of parameter m is limited to the range of 10 to 1000). To avoid the ‘findpeaks’ function returning a null value, we set the default value of m to 1000.

Although off-the-shelf DAMP has achieved an accuracy of 51.2%, significantly better than the best of the deep learning approaches, we noted that there are some simple “tricks” that can further improve its performance. We showed just one example if we use the following line of code to “sharpen” each dataset.

T=[normalize(T,'range')*( max(1,mean(std(T))))+1].^10;

The accuracy of DAMP will increase to 63.2%. The high-level idea of this approach is to apply a mathematical model with the same growth (but faster) to the original time series to “highlight” the anomaly in the time series.

We use the following scoring function to calculate all the results in Table 4. If you are trying to reproduce the results of the KDD21 experiment, this script may be helpful to you. A document detailing the KDD21 dataset can be downloaded here.

Important General Note on using DAMP

There is an important thing to remember when viewing an aMP, as in the blue line in Figure 10.bottom. Failure this understand this may lead a user to think the aMP is indicating an anomaly where there is none.

When you search for the top-k left-discords, the k highest peaks do correctly show the location and strength (the height of the peaks) of the top-k left-discords. However, the remaining peaks in the aMP should not be assumed to indicate slightly smaller anomalies. They may indicate slightly smaller anomalies, but they also simply indicate regions that were pruned by encountering a matching subsequence that was just below the current Best-So-Far.

Page updated

Google Sites

Report abuse