The surrogate data method (also called the surrogate method) is a way to investigate the characteristics (features and nature) of data statistically. There are characteristics in data we cannot tell just by looking at the behaviours and appearances. For example, when the behaviour shows irregular fluctuations we cannot know whether the fluctuations are random or not. We can investigate underlying characteristics of the data statistically by the surrogate data method. To apply the method we specify a characteristic we want to investigate as a hypothesis and generate many data sets according to the hypothesis. We refer to the data as surrogate data. We then see if there is any difference between the analysis target data and the surrogate data sets. If there is no significant difference between them, we consider that the target data contains the specified characteristic. If there is sufficient difference, we consider that the data does not contain the characteristic. We can make a hypothesis freely. In this way we can investigate characteristics of phenomena.
Basic idea of the surrogate data method and an example of the applications
The time series data we want to examine shows irregular fluctuations. One of the reasons of the appearance of irregular fluctuations is nonlinearity. Hence, we investigate whether there is nonlinearity in irregular fluctuations of the time series. The hypothesis we set is that the data is linear. There is a method which can eliminate (or destroy) nonlinearity in data. We generate many data sets using the method, where there is no nonlinearity in the data sets. We refer to the data as surrogate data. If there is no nonlinearity in the original data, the surrogate data sets which do not have nonlinearity have the same characteristic as the original data. Hence, there should not be large difference between these statistics. On the other hand, if there is nonlinearity in the original data, the surrogate data sets which do not have nonlinearity have different characteristic from the original data. Hence, there should be large difference between these statistics.