As an opening to his lecture, Edward Lorenz, Emeritus professor
of meteorology at MIT holds a piece of paper above the stage and
gently lets it go, watching it leisurely float down to the ground.
Lorenz repeats the experiment, and the paper falls again, landing
in a different place on the ground. This serves an illustration
of the theory of chaos.
Chaotic (highly sensitive to initial conditions) behaviour of many
systems was observed by many researchers for a number of decades,
but was first described as such by Lorenz (1963). In 1961 he discovered
the manifestation of chaotic behaviour when he was working with
computer models of weather prediction. It appeared that the model
he was using was extremely sensitive to a small change in one of
the parameters - a change from 0.506127 to 0.506 lead to gradual
deviation of the original sequence of output to a very different
one. This sensitivity dependence on initial conditions is common
to chaos theory. Such a small amount of difference in a measurement
might be considered experimental noise, background noise, or an
inaccuracy of the equipment.
Basics of chaos
One of the important foundations behind the methods of non-linear
signal processing and the chaos theory is an embedding theorem (Takens,
1981). It shows that the use of a single measured variable x(n)
= x(t0 + nt) with t0
some starting time and t the sampling
time, and its time delays provides N-dimensional space that is a
proxy for the full multivariate state space for the observed system.
The N-dimensional state vectors x(t) are then defined as

where x(t) is a value of the time-series at time t,
t is a suitable time delay (sampling
time) and N is the embedding dimension. This vector fully
represents the non-linear dynamics when N is a large enough. The
embedding theorem guarantees that a full knowledge of the behavior
over a system is contained in the time series of any one measurement
and that a proxy for the full multivariate phase space can be constructed
from the single time series. To perform the state space recognition
at time delay t and an embedding
dimension N are needed.
There exist several methods for estimating time
delay and embedded dimension, which are summarized as follows:
Analytical methods for estimating time delay t:
- autocorrelation and power spectrum functions;
- average mutual information (AMI) function;
- degree of separation function;
- Lyapunov exponents;
Analytical methods for estimating embedded dimension
N:
- false nearest neighbours;
- bad prediction method;
- fractal and correlation dimensions;
Empirical methods (for estimating both the time
delay and dimension):
- neural networks;
- derivative-free global optimization methods, like genetic algorithms
For finding t we use the AMI
function, and for finding N - method of false nearest neighbour.
The time delay t must be a large
enough that independent information about the system is in each
component of the vector. However, t
must not be so large that the components of the vectors x(t)
are independent with respect to each other. Conversely, if the time
delay is too short, the vector components will be independent enough
and will not contain any new information. A possible rule for good
time delay t is to use the first
minimum of the average mutual information (Frazer and Swinney 1986).
Average mutual information is derived from notions of entropy in
communications systems (Shannon 1949). It determines how much information
the measurements x(t) at some time have relative to measurements
and some other time x(t+t).
The global embedding dimension N is the minimum number of
time-delay coordinates needed so that the trajectories x(t)
do not intersect in N dimensions. In dimensions less than N, trajectories
can intersect because their projected down into too few dimensions.
Subsequent calculations, such as predictions, may then be corrupted.
If it is too large, noise and other contamination may corrupt other
calculations because noise fills any dimension.
The method of finding a proper N can be described using geometrical
considerations: as N increases, attractors "unfold" and the vectors
that are close in dimension N move to a significant distance apart
in N+1. They are "false" neighbours in dimension N. The method of
false nearest neighbours (Kennel et al., 1992) measures the percentage
of false neighbours as N increases. Points that are close in N are
marked and the number of these points that become widely separated
in N+1 is calculated.

Having parameters t and N identified
and thus phase space reconstructed, one can build the prediction
model in a form of multidimensional maps:

where the phase space x(t)
is the current state of the system and x(t+T)
is the state of the system after a time interval T and fT
is a mapping function. The problem is then to find a good expression
(local models) for the function fT.
The data is embedded and then divided into training
and testing set. Based on the training set, the embedded data space
is quantified (using K-NN algorithm at this stage). Local data sets
are then constructed for each of the prototype vectors.
Finally, local data models (linear at this stage)
are constructed based on the local data sets which are then used
to predict the dynamics of the system (move the system from state
x(t) into state x(t+T)).
Applications of chaos
Chaos theory is currewntly very widely used in various areas of
engineering and even social sciences. As an example, prediction
of surge water levels in a coastal zone is shown:

Local linear model for Surge time-series prediction
(data for the hydrological year 1994/95 with 10-minute interval
are used; 50000 samples used for training and 2000 for testing).
Embedded dimension = 4, Time delay t=9
time steps. Prediction=6 time steps (1 hour). RMSE=3.6 cm.
