Autocorrelation

The main idea...

Autocorrelation, also known as "lagged correlation", is the correlation of data values in a series of measurements with preceding and succeeding data values in the same series (Fig 1a). In order to detect autocorrelation, a "lagged" data series is created by displacing values relative to the original data series (Fig 1b). The size of the displacement is referred to as the "lag". The original and lagged data series are then subject to correlation analysis and the results examined (Fig 2). Low correlation values across lags suggest that no autocorrelation is present. Should this not be the case, analyses appropriate to or able to account for autocorrelated signals should be used. Autocorrelation is often seen in data obtained across time (e.g. time series) or space (e.g. transect sampling). Consider, for example, heterotroph abundance values measured during and between a series of algal blooms or chemoautotroph activity measured in or between a patchily distributed collection of cold seeps.

Multivariate autocorrelation is more challenging to detect. Holgersson (2004) explored the diagnostic use of statistics such as Pillai's trace and Roy's largest root (as seem in MANOVA). Holgersson notes that the test selection can strongly influence the diagnostic power and suggests that the largest F test is perhaps the most readily approachable method for assessing multivariate autocorrelation in applied studies. The largest F test relies on the covariance matrix of the data and is the maximal F statistic generated by all marginal F tests.

Figure 1: a) An autocorrelated data series. High and low values tend to occur together in "peaks" and "troughs", respectively. b) A set of lagged data series (lags x,y, and z) may be generated from the original data to determine the parameters of the autocorrelation. Note that more periods would be needed to confidently estimate the parameters of the autocorrelation function appropriate to model this series.

Figure 2: An autocorrelation plot. The height of each vertical line indicates the strength of signal's correlation to itself at a given lag.

Key assumptions

- The correlation measure used to detect autocorrelation must be appropriate to the data. Visual inspection is important to identify complex autocorrelation patterns and select an appropriate measure.
- As any correlation procedure, one should have sufficient objects with non-null variable values. If the samples available do not span the period of an autocorrelated signal, detection of that signal is very unlikely.

Warnings

- In ecosystems with patchy community distribution and/or environmental patchiness, autocorrelation may be an artifact of the sampling methodology used. Random sampling is generally less prone to consistently sampling within or outside of patches than systematic sampling and may provide a more reliable representation of a given area.

Implementations

- R
  - For univariate series, the autocorrelation function acf() estimates autocorrelation and autocovariance functions. A partial method pacf() and cross-correlation/covariance method ccf() are also available.
  - A multivariate autocorrelation test, multispati.rtest(), is available as part of the ade4 package. The use of duality diagrams is presupposed.

References

- Holgersson HET (2004) Testing for multivariate autocorrelation. J Appl Stat. 31(4): 379-395.

Page updated

Google Sites

Report abuse