Traditionally, modelling in civil engineering is based on good
understanding of the underlying processes and use so-called "physically-based"
(or "knowledge-driven", behavioral) models. Physical
principles describing the water dynamics are written down as the
equations (for example, the Navier-Stokes equations), they are transformed
them into the solvable form and solved using a computer.
These could be for example, models based on Navier-Stokes equation
describing behavior of water in particular circumstances. Examples
are surface (river) water 1D models, coastal 2D models, groundwater
models, etc. Equations are solved using finite-difference, finite-element
or other schemes and results - normally water levels, discharges
- are presented to decision makers. Often such models are called
simulation models. Knowledge-driven models can be also "social",
"economic", etc. The observed data is used during the
model calibration. Such models are referred as physically-based,
simulation, or process models.
On the contrary, "data-driven" model of a system
is defined as a model connecting the system state variables (input,
internal and output variables) with only a limited knowledge of
the details about the "physical" behavior of the system.
Probably the simplest data-driven model is a linear regression model.
The general knowledge of physics is of course needed (for the proper
choice of relevant parameters) but this knowledge may be not so
detailed as needed for the physically-based models. "Hybrid
models" combine both types of models.
Generally speaking, the physically-based models are more accurate
and more general. The problem is that sometimes it is not possible
to build trustworthy models. In such cases, if the observation data
is available, the data-driven models may help. DDM complements the
simulation modelling and in some cases could replace it.
Machine learning
ML is traditionally considered as a part of Artificial Intelligence.
It is aiming at building programs that improve with experience (that
is, learn). The two most important problems that ML solves are:
- classification – when an example (a point in the input space)
has to be classified to one of several classes;
- regression (numerical prediction).
Important results were achieved in this area in the 1970-80s. The
researchers from Prof. Aizerman's group in the Institute of Control
Problems of the Russian Academy of Sciences could be mentioned here.
One of the outstanding results of that group was the development
by Vladimir Vapnik of statistical learning theory, used currently
in techniques based on the so-called support vector machines
(do not confuse with vector computers!). This area received lately
a lot of attention (see Vapnik's book The Nature of Staistical
Learning Theory, 1995).