Running mean filtering

On the common practice of removing a running mean of the previous 120 days from a time series in order to remove low frequency variability.

A method commonly used to remove interannual variability from a time series, particularly in studies related to the Madden-Julian Oscillation (MJO), is the removal of the mean of the previous 120 days [1][2][3][4][5][6][7][8][9][10]. This type of filtering seems to first appear in Wheeler and Hendon (2004), who introduce it as a method to filter data in real-time (e.g., a centered running mean filter would not be possible because of the lack of information from future time steps). However, this method must be used with caution, because if used incorrectly, not only will it fail to remove power from the intended frequency, but it can also introduce power in unintended frequencies!

NOAA High Resolution SST data provided by the NOAA/OAR/ESRL PSD, Boulder, Colorado, USA, from their Web site at https://www.esrl.noaa.gov/psd/

This is easily demonstrated with an example. Here is a daily time series of sea surface temperatures over the equatorial eastern Pacific Ocean (2S-2N, 90-110W). Interannual variability is strong due to the El Niño Southern Oscillation. See, for example, the strong spikes in sea surface temperatures over this region during the 1982/1983 and 1997/1998 El Niño events.

Here, an attempt to remove interannual variability has been made by a running mean of the previous 120 days from each point. Visual inspection of the result (yellow line) shows that interannual variability is still clearly visible.

As a comparison, here is the same time series of sea surface temperature, but where a centered running mean (120 days = 60 days before and 60 days after) has been subtracted from each point. The centered running mean filter does a much better job at removing interannual variability.

Here are the power spectra for each of the three time series shown above. The original sea surface temperature data have a clear spike in the El Niño band (2-5 years). While some of this power is indeed removed using a previous running mean filter (yellow line), this filter is nowhere as effective as the centered running mean filter (dark red line). In fact, it appears as though the time series with the previous running mean filter applied has more power in some bands than the original data (see spikes near 0.9 and 1.5 years)! What is going on here?

Why is a running mean filter using the previous 120 days not as effective as one that uses a centered running mean?

Instead of removing an oscillation, a previous running mean filter will phase shift certain signals (see the figure comparing the raw sea surface temperature data to the previous running mean filtered time series, above).

For a perfectly sinusoidal oscillation, this final figure shows how much power is removed when using a running mean filter of (1) centered and (2) previous data, as a function of the window size.

For the centered running mean filter, the oscillation is most effectively removed when a small window is used (relative to the period of the oscillation). For example, if you have a time series with an embedded oscillation that has a period of 10 days, then using a centered running mean filter with a period of 10 days will not work for removing the oscillation. On the other hand, subtracting a centered running mean that uses a window size of 3 days from a time series would leave less than 20% of the power of the 10-day oscillation in the resultant filtered time series. Of course, the nature of the data (e.g., sampling frequency) and the research question (e.g., the importance of retaining other frequencies in the data), are also important to consider in choosing a window size when using a centered running mean filter.
The case of using a previous running mean filter is interesting. While a previous running mean filter will remove power in bands that are greater than 5 times larger than the window size (frequencies for which the averaging window is up to 20% of the period), this filter will actually add power in frequencies for which the averaging window is between about 20% - 75% of the period! Looking at the power spectra for the sea surface temperature time series above, this is why there is a spike in power when using a previous running mean filter for some frequencies. In the figure to the right, the dot indicates the ratio between a running mean window size of 120 days, and an oscillation with a period of 4 years (El Niño is near this ratio). Thus, when using a previous running mean filter of size 120 days on data that has a strong El Niño signal, only about 50% of the power in the El Niño band will be removed, while power for oscillations with periods between about 160 and 600 days will be inadvertently amplified.

Page updated

Report abuse