Background
Rainfall-runoff models are increasingly used in hydrology for a wide
range of applications, for example, to extend streamflow records, in the design
and operation of hydraulic structures, for real time flood forecasting, to
estimate flows of ungauged catchments, and to predict the effect of land-use
and climate change. Such models play an important role
in water resource planning and management of river basins. These models attempt
to simulate complex hydrological processes that lead to the transformation of
rainfall into runoff, with varying degrees of abstraction.
A plethora of rainfall-runoff models, varying in nature, complexity,
and purpose, has been developed and used by researchers and practitioners in
the last century. These rainfall-runoff models encompass a broad spectrum of
more or less plausible descriptions of rainfall-runoff relations and processes
ranging from the primitive empirical black box model such as the Sherman unit
hydrograph method (see, e.g. Sherman,
1932) to the lumped
conceptual models such as the Stanford (Crawford and
Linsley, 1966), Sacramento (Burnash et al.,
1973), HBV (Bergström and
Forsman, 1973) models, and the
physically based distributed models such as the Mike-SHE model (Abbott et al.,
1986a, b). Rapid growth in
computational power, the increased availability of distributed hydrological
observations and an improved understanding of the physics and dynamics of water
systems permit more complex and sophisticated models to be built. While these advances in principle lead to more accurate (less uncertain)
models, at the same time, if such complex models with many parameters and data
inputs are not parameterized properly or lack input data of reasonable quality,
they could be an inaccurate representation of reality.
Since by definition, a rainfall-runoff model is only an abstraction of a
complex, non-linear, time and space varying hydrological process of reality,
there are many simplifications and idealisations. These models contain parameters that cannot often be measured
directly, but can only be estimated by calibration with a historical record of measured
output data. The system input (forcing) data such as rainfall, temperature,
etc. and output are often contaminated by measurement errors. This inevitably leads to uncertain parameter estimates. Consequently predictions
made by such rainfall-runoff model are far from being perfect, in other words,
there always exists a discrepancy between the model prediction and the
corresponding observed data, no matter how precise the model is and how perfectly
the model is calibrated. Thus the model errors which are the mismatch between
the observed and the simulated system behaviour are unavoidable in rainfall-runoff modeling due to the inherent
uncertainties in the process. Various sources of uncertainty in rainfall-runoff
modeling are presented in sections 2.4 and 2.5.
Uncertainty analysis in rainfall-runoff modelling
In many fields uncertainty is well recognized and accounted for properly.
For example in meteorological sciences, the deterministic weather forecasts or
predictions are typically given together with the associated uncertainty.
Uncertainty has been also treated in the assessment of the Intergovernmental
Panel on Climate Change, IPCC (Swart et al., 2009). In engineering design such as coastal and
river flood defenses uncertainty is treated implicitly through conservative
design rules, or explicitly by a probabilistic characterization of
meteorological events leading to extreme floods. Historically the problem of accurately determining river
flows from rainfall, evaporation and other factors was a major focus in
hydrology. During the last two decades, there has been a great deal of research
into the development and application of (auto) calibration methods (see, e.g., Duan et al., 1992; Solomatine et
al., 1999) to improve the deterministic
model predictions. Almost all existing river flow simulation techniques are conceived to
provide a single estimate, since most research in operational hydrology has
been dedicated to finding the best estimate rather than quantifying the
uncertainty of model predictions (Singh and
Woolhiser, 2002).
It is now being broadly recognized that proper consideration of
uncertainty in hydrologic predictions is essential for purposes of both
research and operational modeling (Wagener and Gupta, 2005). Along with the recognition of the uncertainty of
physical processes, the uncertainty analysis of rainfall-runoff models has
become a popular research topic over the past two decades. The value of a
hydrologic prediction to water resources and other relevant decision-making
processes is limited if reasonable estimates of the corresponding predictive
uncertainty are not provided (Georgakakos et al., 2004). Explicit recognition of
uncertainty is not enough; in order to have this notion adopted by decision
makers in water resources management, uncertainty should be properly estimated
and communicated (Pappenberger
and Beven, 2006). The
research community, however, has done quite a lot in moving towards the
recognition of the necessity of complementing point forecasts of decision
variables by the uncertainty estimates. Hence, there is a widening recognition
of the necessity to (i) understand and identify of the sources of uncertainty;
(ii) quantify uncertainty; (iii) evaluate the propagation of uncertainty
through the models; and (iv) find means to reduce uncertainty. Incorporating
uncertainty into deterministic predictions or forecasts helps to enhance the
reliability and credibility of the model outputs.
This dissertation is devoted to developing new methods to analyse model uncertainty,
which are based on the methods of machine learning. This study is, in general,
in the field of Hydroinformatics, the area that aims in particular at introducing
methods of machine learning and computational intelligence into the practice of
modelling and forecasting (Abbott, 1991). This study is at the interface between
different scientific disciplines: hydrological modelling, statistical and machine learning, and
uncertainty analysis.
One may observe a significant proliferation
of uncertainty analysis methods published in the academic literature, trying to provide
meaningful uncertainty bounds of the model predictions. Pappenberger et al. (2006)
provide a decision tree to find the appropriate method for a given situation. However, methods to
estimate and propagate this uncertainty have so far been limited in their
ability to distinguish between different sources of uncertainty and in the use
of the retrieved information to improve the model structure analysed. In general, these methods can be broadly classified into six categories (see, e.g., Montanari, 2007; Shrestha and Solomatine, 2008):
1.
Analytical
methods (see, e.g.,
Tung, 1996);
2. Approximation
methods, e.g., first-order second moment method (Melching, 1992);
3. Simulation and
sampling-based (Monte Carlo) methods (see, e.g.,
Kuczera and Parent, 1998);
4. Methods from
group (3) which are also generally attributed to Bayesian methods, e.g., ‘‘generalised likelihood uncertainty estimation’’ (GLUE) by Beven and
Binley (1992);
5. Methods based on
the analysis of model errors (see, e.g.,
Montanari and Brath, 2004); and
6. Methods based on
fuzzy set theory (see, e.g.,
Maskey et al., 2004).
Detailed descriptions of these methods are given in section 2.8.
Most of the existing methods (e.g.,
categories (3) and (4)) analyse the uncertainty of the uncertain input variables
by propagating it through the deterministic model to the outputs, and hence
require the assumption of their distributions and error structures. Most of the
approaches based on the analysis of the model errors require certain assumptions
regarding the residuals (e.g., normality and homoscedasticity). Obviously, the relevance
and accuracy of such approaches depend on the validity of these assumptions.
The fuzzy theory-based approach requires knowledge of the membership function
of the quantity subject to the uncertainty which could be very subjective.
Furthermore, in majority of the methods, uncertainty of the model output is
mainly attributed to uncertainty in the model parameters. For instance, Monte
Carlo (MC) based methods analyse the propagation of uncertainty of parameters
(measured by the probability density function, pdf) to the pdf of the output.
Similar types of analysis are performed for the input or structural uncertainty
independently. Methods based on the analysis of the model errors typically
compute the uncertainty of the “optimal
model” (i.e. the model with the calibrated parameters and the fixed
structure), and not of the “class
of models” (i.e. a group of models with the same structure but parameterised
differently) as, for example, MC methods do.
The contribution of various sources of
errors to the total model error is typically not known and, as pointed out by Gupta
et al. (2005), disaggregation of errors into their source components is often
difficult, particularly in hydrology where models are non-linear and different
sources of errors may interact to produce the measured deviation. Nevertheless,
evaluating the contribution of different sources of uncertainty to the overall
uncertainties in model prediction is important, for instance, for understanding
where the greatest sources of uncertainties reside, and, therefore directing
efforts towards these sources (Brown and Heuvelink, 2005). In general,
relatively few studies have been conducted to investigate the interaction
between different sources of uncertainty and their contributions to the total model
uncertainty (Engeland
et al., 2005; Gupta et al., 2005). For
the risk based decision-making process such as flood warnings, it is more
important to know the total model uncertainty accounting for all sources of
uncertainty than the uncertainty resulting from individual sources.
However, the practice of uncertainty
analysis and use of the results of such analysis in decision making is not
widespread, for several reasons (Pappenberger
and Beven, 2006).
Uncertainty analysis takes time, so adds to the cost of risk analysis, options
appraisal and design studies. It is not always clear how uncertainty analysis
will contribute to improved decision making. Much of the academic literature on
hydrological uncertainties (Liu
and Gupta, 2007) has
tended to focus upon forecasting problems. Identifying uncertainty bounds on a
flood forecast is important, but to be meaningful needs to be set within the
context of a well defined decision problem (Frieser
et al., 2005; Todini, 2008). The
uncertainty analysis requires careful interpretation in order to understand the
meaning and significance of the results. It is through this process of scrutiny
and discussion that the most useful insights for decision makers are obtained.
Furthermore, the conduct of uncertainty analysis provides new insights into
model behaviour that will need to be discussed and agreed with the experts
responsible for models that are input into the analysis. Indeed the process of
scrutiny that uncertainty analysis provides is an additional benefit (Hall
and Solomatine, 2008).
Experience is now growing in the
communication of uncertainty to decision makers and members of the public, for
example in the context of environmental risks (Sluijs
et al., 2003) and
climate change (IPCC,
2005). Sluijs et al. (2003) stress the importance of engaging stakeholders from this early
stage, identifying the target audiences and then using appropriate language to communicate
uncertainties. Alongside numerical results and their implications for decision
makers, the limitations of data sources and analysis methods should be made
clear and areas of ignorance should be highlighted.
Machine learning in uncertainty analysis
Over the last 15
years many machine learning techniques have been used extensively in the field
of rainfall-runoff modelling (see section 3.4 for more detail). These techniques
have been also used to improve the accuracy of prediction/forecasting made by
process based rainfall-runoff models. Generally, they are used to update the
output variables by forecasting the error of the process based models. All
these techniques to update the model predictions can be seen as error modelling
paradigm to reduce the uncertainty of the predictions. However, these
techniques do not provide explicitly the uncertainty of the model prediction in
the form of prediction bounds or probability distribution function of the model
output.
In this thesis
we explore the possibility of using machine learning techniques which can
provide the reasonable uncertainty estimation for the runoff prediction made by
rainfall-runoff models.
As mentioned above in section 1.1, with advances in
computational power and technological development, more complex and
sophisticated distributed rainfall-runoff models have been built and used in
practice. Computational burden in distributed runoff models is now less
problematic than before, although it still can be an issue when predictive
uncertainty of a model is assessed through laborious MC simulations (Beven, 2001). Several uncertainty analysis methods based on MC simulations have been
developed to propagate the uncertainty through the models. The MC based method for uncertainty analysis of the outputs of such
models is straightforward, but becomes impractical in real time applications for
computationally intensive complex models when there is insufficient time to
perform the uncertainty analysis because the large number of model runs is
required. Practical implementation of MC based uncertainty analysis methods
face two major problems: (i) convergence of the MC simulations is very slow (with
the order of computational complexity (O) of
), so a large number of runs needed to establish a reliable
estimate of uncertainties; and (ii) the number of simulations increases
exponentially with the dimension of the parameter vector (O(np)) to cover the entire
parameter domain, where s is the
number of simulations, p is the
dimension of parameter vector, n is
the number of samples required for each parameter.
A number of research have been conducted to improve the efficiency of MC
based uncertainty analysis methods such as Latin hypercube sampling (McKay et al., 1979), and the moment propagation techniques (Rosenblueth, 1975;
Harr, 1989; Melching, 1992). However all these
methods require running the model many times in both offline and online mode. In
other words, MC based methods require running the models in a loop each time
when the uncertainty of the model prediction for the new input data x(T+1) is required. In this thesis we explore an efficient
method to assess the uncertainty of the model M for t = T+1 when new input data x(T+1) is feed. The method we propose encapsulates the MC based
uncertainty results in machine learning models and is referred to as a “Machine
Learning in parameter Uncertainty Estimation” (MLUE). In the MLUE method, the machine learning model is
used as a surrogate model to emulate the laborious MC based uncertainty methods
and hence provides an approximate solution to the uncertainty analysis in a real
time application without re-running the MC runs. Surrogate modelling is the process of constructing approximation models (emulators) that mimic the
behavior of the simulation model as closely as possible while being
computationally cheap(er) to run. We believe that it
is preferable to have an approximate uncertainty estimate than no uncertainty
estimate at all.
Yet another problem relates to the situation when an interest is in
assessing model uncertainty when it is difficult to attribute it to any particular
source. If the data and resources are available and the computational time
allows to do a full MC based uncertainty analysis method, then it is preferable
to perform the latter. However in practice, engineering decisions are often based
on a single (optimal) model run without any uncertainty analysis. In this thesis we also develop a novel method for uncertainty analysis of a calibrated
model based on the historical model residuals. The historical model residuals
(errors) between the model prediction and the observed data are the best
available quantitative indicators of the discrepancy between the model and the
real-world system or process, and they provide valuable information that can be
used to assess the predictive uncertainty. The residuals and their distribution
are often functions of the model input variables and can be predicted by
building a separate model mapping of the input space to the model residuals or
even their probability distribution function. In other words, the idea here is
to learn the relationship between the probability distribution of the model residuals
and the input variables; and to use this relationship to predict the uncertainty
of the model prediction of the output variable (e.g., runoff) in the future. This approach is referred to as an ‘‘UNcertainty Estimation
based on local Errors and Clustering’’ (UNEEC). The UNEEC method
estimates the uncertainty of the optimal model that takes into account all
sources of errors without attempting to disaggregate the contribution given by
their individual sources. The UNEEC method is based on the concept of
optimality instead of equifinality as it analyzes the historical model
residuals resulting from the optimal model (both in structure and parameter).
Objective of this study
The aim of this research is to develop
methodology for uncertainty analysis in rainfall-runoff modelling using machine
learning techniques. The objectives of the research are:
1.
To review the existing methods of uncertainty analysis in rainfall-runoff
modelling;
2. To review machine learning methods
and to investigate the possibility of applying machine learning methods in
uncertainty analysis;
3. To
develop a methodology for uncertainty analysis in rainfall-runoff modeling
using machine learning methods;
4. To develop a methodology for the surrogate
modeling of uncertainty generated by the Monte Carlo based uncertainty methods;
and
5. To
implement the developed methodologies in computer codes and to test the
methodologies by application to real-world problems.
Outline of the thesis
The thesis is organised in eight chapters.
A brief overview of the structure is given below. Chapter 2 is devoted to a review of uncertainty analysis
especially in rainfall-runoff modelling. It starts with brief overviews
of rainfall-runoff models and their classification which is followed by the sources
of uncertainty in the context of rainfall-runoff modelling. It also discusses the
commonly used uncertainty representation theories of probability, fuzzy logic and
entropy. It briefly reviews the various uncertainty analysis methods used in
rainfall-runoff modelling.
Chapter 3
presents several machine learning techniques used
in this study. It describes artificial neural networks, model trees, instance
based learning and clustering techniques.
In Chapter
4, a novel method “Machine Learning in
parameter Uncertainty Estimation” (MLUE) to modelling
parametric uncertainty of rainfall-runoff models is presented. It is observed
that there exists a dependency between the forcing input data, the state
variables of rainfall-runoff models and the uncertainty of the model
predictions. Chapter 4 explores building machine learning models to approximate
the functional relationship between the input data (including state variables
if any) and the uncertainty of the model prediction such as a quantile.
Chapter 5
presents the application of the MLUE method for parametric uncertainty representation
and analysis. Various machine learning models such as artificial neural
networks, model trees, locally weighted regression, have been used. The MLUE method is applied to analyse the uncertainty of a lumped
conceptual rainfall-runoff model of the Brue catchment in the UK.
Chapter 6 presents a novel method “Uncertainty Estimation based on Local Error
and Clustering” (UNEEC) for uncertainty
analysis of rainfall-runoff models. This method assumes that the model
residuals or errors are indicators of the total model uncertainty. The method estimates the “residual uncertainty” of the optimal model
that takes into account all sources of errors without attempting to
disaggregate the contribution given by their individual sources.
Chapter 7 provides the application of the UNEEC
methodology to estimate uncertainty of rainfall-runoff models in a number of
catchments. In the first part the UNEEC method is applied to estimate
uncertainty of the forecasts made by several machine learning methods in the Sieve
catchment in Italy.
The second part covers the application to estimate uncertainty of the
conceptual rainfall-runoff models of two catchments: the Brue in UK and the Bagmati in Nepal. The comparison results with
other uncertainty methods are presented as well.
Chapter 8 presents
the conclusions of the research based on the various case studies presented in
this thesis. Finally the
possible directions for further research are suggested.