Red noise estimation

This set of procedures, available in version 3 or later, has been developed to handle time series with a red noise component (or serial correlation). Red noise is the same as a first-order autoregressive (AR1) stationary Gaussian process with a positive correlation at lag 1, also called the Markov process. If red noise or serial correlation is present in the time series, it is necessary to either adjust the significance level of the shifts by calculating the effective degrees of freedom, or use a so-called "prewhitening" procedure prior to application of a regime shift detection method. In any case, it requires an estimate of the AR1 autoregressive parameter, which is equal to the sample lag-1 autocorrelation coefficient (r1). This can be really tricky for the time series containing both red noise and regime shifts. Two methods of estimating r1 have been implemented here. The first method, MPK, is based on the formula for the bias in the ordinary least squares (OLS) estimate of r1 suggested by Marriott and Pope (1954) and Kendall (1954). The second method, called IP4 for short, is based on the assumption that the bias is approximately inversely proportional to the sample size, with four subsequent corrections applied. Both methods are described in Rodionov (2006).

Figure below schematically explains the options available in this section. If "None" is chosen, then no r1 estimation is performed. All other options require an r1 estimate. Note that the OLS estimate is calculated using the entire time series. The MPK and IP4 methods break the time series into subsamples, estimate bias corrected r1 for each subsample and then use the median value of all estimates. The suggested subsample size m is calculated as m = (l + 1)/3, where l is the cutoff length. It is recommended to experiment with different subsample sizes to see how it affects the r1 estimate.

If the Prewhitening box is unchecked, r1 is estimated given the adjusted degrees of freedom (DFadj) for the RSI: DFadj = 2leq - 2, where leq is the equivalent cutoff length, calculated using the formula in Von Storch and Zwiers (1999, p. 115) for the equivalent sample size. This formula is also used to calculate the p-value for the shifts adjusted for serial correlation.

If the Prewhitening box is checked, the regime shifts are detected for the filtered time series. No adjustments for serial correlation are made when calculating the DF. As an output, the user can see either the filtered time series (the Filtered box checked), or get back to the original time series (the Filtered box unchecked). In the latter case, the final significance level is adjusted using the equivalent sample size formula.

Fig. 1. Schematic for the red noise estimation options in the program.

The reason to use subsamples when estimating r1 is that if the time series contains both red noise and regime shifts, r1 tends to be overestimated if the entire series is used (Rodionov, 2006). Prewhitening with an overestimated r1 not only filter out red noise, but also reduce the magnitude of the regime shifts. That was probably the primary reason why Beaulieu and Killick (2018) came to a conclusion that the Pacific Decadal Oscillation (PDO) is best described by a lag-1 autoregressive model with no shifts. They used their own method named EnvCpt that fits eight models with different combinations of a trend, changepoints (regime shifts) and red noise and then identifies the most appropriate one according to the Akaike information criterion. In addition to the PDO analysis they conducted Monte Carlo experiments to compare EnvCpt with STARS and another method based on Bayesian identification of multiple changepoints in a regression model (BMCpt). Judging from their Figs. D1d and D2b, when prewhitening is used, STARS outperforms both EnvCpt and BMCpt in the case when the analyzed time series contain a combination of red noise and changepoints. STARS was better in detecting both the accurate number of changepoints and their locations.

If a deterministic linear trend is also present (strong enough to be seen upon a visual inspection of the time series graph), it is recommended to remove it first. The reason is the same - a trend can contaminate the estimate of serial correlation, leading to overestimation of r1 (Yue and Pilon 2003). The effect is smaller for smaller sample sizes, which points to another advantage of using subsamples in the MPK and IP4 procedures. While those procedures work well in most cases, sometimes prewhitening alone may not be enough and detrending of the time series before estimating r1 may be required. Detrending can remove a deterministic trend without distorting the existing AR process (Yue and Pilon 2003).

Beaulieu and Killick (2018) show that STARS performs poorly when the time series contains a trend, a changepoint and serial correlation (see their Fig. D1b). The model that was used to generate synthetic time series for that experiment represented a set of two equations:

xt = -0.112 - 0.001t + 0.659xt-1 + et , t = 1, 2, … 113,

xt = -1.707 + 0.013t + 0.153xt-1 + et , t = 114, 115, … 166,

where et were independent and identically distributed random numbers from a normal distribution with a zero mean and variance of 0.1. That model was supposed to mimic changes in the global surface air temperature for the last 166 years with a shift in the 1970s. Note that the changepoint in that case represented a simultaneous shift in all three parameters: a sharp downward (!) shift in the mean (from -0.112 to -1.707), a change from a slightly negative trend with a slope of -0.001 to a strongly positive one with a slope of 0.013, and a shift in r1 from 0.659 to 0.153 meaning a substantial weakening of serial correlation. Obviously, STARS is not designed to detect such unusual changepoints.

Indeed, STARS works best when the time series contains no deterministic trend, such as the PDO series, from which the global warming trend is removed by design. However, assuming the AR1 autoregressive parameter is not changing over time, both MPK and IP4 procedures can accurately estimate it not only in the presence of regime shifts, but also in the presence of a weak-to-moderate trend. Subsequent prewhitening eliminates both the linear trend and red noise, without having a substantial impact on the magnitude of regime shifts (Rodionov, 2006).

In summary, if a trend and/or serial correlation are present in the time series, it is critically important to eliminate them using either MPK or IP4 prewhitening procedure, specifically designed to work with time series containing regime shifts. If a linear trend is strong, it is better to detrend the time series first to improve accuracy of estimation of the lag-1 autocorrelation coefficient.

Page updated

Google Sites

Report abuse