The surrogate data method (also called the surrogate method) is a hypothesis-testing technique for statistically verifying hidden features in time-series data. Data may contain characteristics that cannot be identified by simply observing their behaviour. For example, if a certain behaviour shows irregular fluctuations, it is not possible to determine from observation only whether those changes follow any underlying pattern or not. By using the surrogate method, we can statistically examine whether such hidden features actually exist. First, we set a hypothesis that the feature of interest does not exist. This is called the null hypothesis. The null hypothesis is a baseline assumption that the feature is absent, and it is used to assess whether the data contradict this assumption. For example, we generate many data sets that retain the properties expected under the null hypothesis, such as the distribution or power spectrum, while destroying other structures. These constitute the surrogate data. The term “surrogate” means “substitute” or “proxy”, and here refers to data that would be obtained in a world where the null hypothesis is true.
Next, the same test statistic is applied to both the original data and a large number of surrogate data sets, and the results are compared. If the statistic from the original data differs sufficiently from the distribution of statistics derived from the surrogate data, the null hypothesis is rejected (that is, it is judged to be inconsistent with the data), and the feature of interest is considered to be present. Conversely, if no clear difference is observed, the null hypothesis cannot be rejected, and we cannot conclude that the feature exists. This does not mean that the feature is absent, but rather that it could not be statistically detected. By selecting which statistical properties to retain and which structures to destroy, the surrogate method allows various types of features to be examined. In this way, it deepens our understanding of the nature of the observed data and helps to elucidate the characteristics of the phenomenon.
Basic idea of the surrogate data method and an example of the applications
If the data we want to examine shows irregular fluctuations, it is possible that a complex structure, such as nonlinearity, lies behind it. We therefore investigate whether any nonlinearity is present in the data. The null hypothesis we set is that the data constitute a linear Gaussian process with a given power spectrum (autocorrelation structure). To test this hypothesis, we generate a large number of artificial data sets that preserve only the power spectrum of the original data while destroying all other structures (nonlinearity, non-Gaussianity, and certain forms of nonstationarity). These are known as surrogate data, and they may be regarded as random realisations of a linear Gaussian process satisfying the null hypothesis.
If the observed data follow a linear Gaussian process, the values of the statistics computed from the observed data should fall within the distribution of statistics obtained from the surrogate data. Conversely, if the observed data contain structure that cannot be explained by linearity only, there should be large difference between these statistics.