Tips

How to get feedback from the visitors?
Google Docs Form can be used to get feedback from the site visitors. Learn more here.

Some useful handy software

HOW to?

Recent site activity

411days since
Nepalese New Year 2068

PhD Research‎ > ‎

Chapter 1

Introduction

This chapter introduces the subject of this research - uncertainty analysis in rainfall-runoff modelling by applying machine learning. It starts with the background of rainfall-runoff modelling and the treatment of uncertainty analysis within the context of rainfall-runoff modelling. A brief review of uncertainty analysis methods and their shortcomings is also presented. Finally the objectives and the structure of the thesis are presented.




View as Pdf file from here.

 Background

Rainfall-runoff models are increasingly used in hydrology for a wide range of applications, for example, to extend streamflow records, in the design and operation of hydraulic structures, for real time flood forecasting, to estimate flows of ungauged catchments, and to predict the effect of land-use and climate change. Such models play an important role in water resource planning and management of river basins. These models attempt to simulate complex hydrological processes that lead to the transformation of rainfall into runoff, with varying degrees of abstraction.

A plethora of rainfall-runoff models, varying in nature, complexity, and purpose, has been developed and used by researchers and practitioners in the last century. These rainfall-runoff models encompass a broad spectrum of more or less plausible descriptions of rainfall-runoff relations and processes ranging from the primitive empirical black box model such as the Sherman unit hydrograph method (see, e.g. Sherman, 1932) to the lumped conceptual models such as the Stanford (Crawford and Linsley, 1966), Sacramento (Burnash et al., 1973), HBV (Bergström and Forsman, 1973) models, and the physically based distributed models such as the Mike-SHE model (Abbott et al., 1986a, b). Rapid growth in computational power, the increased availability of distributed hydrological observations and an improved understanding of the physics and dynamics of water systems permit more complex and sophisticated models to be built. While these advances in principle lead to more accurate (less uncertain) models, at the same time, if such complex models with many parameters and data inputs are not parameterized properly or lack input data of reasonable quality, they could be an inaccurate representation of reality.

Since by definition, a rainfall-runoff model is only an abstraction of a complex, non-linear, time and space varying hydrological process of reality, there are many simplifications and idealisations. These models contain parameters that cannot often be measured directly, but can only be estimated by calibration with a historical record of measured output data. The system input (forcing) data such as rainfall, temperature, etc. and output are often contaminated by measurement errors. This inevitably leads to uncertain parameter estimates. Consequently predictions made by such rainfall-runoff model are far from being perfect, in other words, there always exists a discrepancy between the model prediction and the corresponding observed data, no matter how precise the model is and how perfectly the model is calibrated. Thus the model errors which are the mismatch between the observed and the simulated system behaviour are unavoidable in rainfall-runoff modeling due to the inherent uncertainties in the process. Various sources of uncertainty in rainfall-runoff modeling are presented in sections 2.4 and 2.5.

 Uncertainty analysis in rainfall-runoff modelling

In many fields uncertainty is well recognized and accounted for properly. For example in meteorological sciences, the deterministic weather forecasts or predictions are typically given together with the associated uncertainty. Uncertainty has been also treated in the assessment of the Intergovernmental Panel on Climate Change, IPCC (Swart et al., 2009). In engineering design such as coastal and river flood defenses uncertainty is treated implicitly through conservative design rules, or explicitly by a probabilistic characterization of meteorological events leading to extreme floods. Historically the problem of accurately determining river flows from rainfall, evaporation and other factors was a major focus in hydrology. During the last two decades, there has been a great deal of research into the development and application of (auto) calibration methods (see, e.g., Duan et al., 1992; Solomatine et al., 1999) to improve the deterministic model predictions. Almost all existing river flow simulation techniques are conceived to provide a single estimate, since most research in operational hydrology has been dedicated to finding the best estimate rather than quantifying the uncertainty of model predictions (Singh and Woolhiser, 2002).

It is now being broadly recognized that proper consideration of uncertainty in hydrologic predictions is essential for purposes of both research and operational modeling (Wagener and Gupta, 2005). Along with the recognition of the uncertainty of physical processes, the uncertainty analysis of rainfall-runoff models has become a popular research topic over the past two decades. The value of a hydrologic prediction to water resources and other relevant decision-making processes is limited if reasonable estimates of the corresponding predictive uncertainty are not provided (Georgakakos et al., 2004). Explicit recognition of uncertainty is not enough; in order to have this notion adopted by decision makers in water resources management, uncertainty should be properly estimated and communicated (Pappenberger and Beven, 2006). The research community, however, has done quite a lot in moving towards the recognition of the necessity of complementing point forecasts of decision variables by the uncertainty estimates. Hence, there is a widening recognition of the necessity to (i) understand and identify of the sources of uncertainty; (ii) quantify uncertainty; (iii) evaluate the propagation of uncertainty through the models; and (iv) find means to reduce uncertainty. Incorporating uncertainty into deterministic predictions or forecasts helps to enhance the reliability and credibility of the model outputs.

This dissertation is devoted to developing new methods to analyse model uncertainty, which are based on the methods of machine learning. This study is, in general, in the field of Hydroinformatics, the area that aims in particular at introducing methods of machine learning and computational intelligence into the practice of modelling and forecasting (Abbott, 1991). This study is at the interface between different scientific disciplines: hydrological modelling, statistical and machine learning, and uncertainty analysis.

One may observe a significant proliferation of uncertainty analysis methods published in the academic literature, trying to provide meaningful uncertainty bounds of the model predictions. Pappenberger et al. (2006) provide a decision tree to find the appropriate method for a given situation. However, methods to estimate and propagate this uncertainty have so far been limited in their ability to distinguish between different sources of uncertainty and in the use of the retrieved information to improve the model structure analysed. In general, these methods can be broadly classified into six categories (see, e.g., Montanari, 2007; Shrestha and Solomatine, 2008):

1.       Analytical methods (see, e.g., Tung, 1996);

2.       Approximation methods, e.g., first-order second moment method (Melching, 1992);

3.       Simulation and sampling-based (Monte Carlo) methods (see, e.g., Kuczera and Parent, 1998);

4.       Methods from group (3) which are also generally attributed to Bayesian methods, e.g., ‘‘generalised likelihood uncertainty estimation’’ (GLUE) by Beven and Binley         (1992);

5.       Methods based on the analysis of model errors (see, e.g., Montanari and Brath, 2004); and

6.       Methods based on fuzzy set theory (see, e.g., Maskey et al., 2004).

 Detailed descriptions of these methods are given in section 2.8.

Most of the existing methods (e.g., categories (3) and (4)) analyse the uncertainty of the uncertain input variables by propagating it through the deterministic model to the outputs, and hence require the assumption of their distributions and error structures. Most of the approaches based on the analysis of the model errors require certain assumptions regarding the residuals (e.g., normality and homoscedasticity). Obviously, the relevance and accuracy of such approaches depend on the validity of these assumptions. The fuzzy theory-based approach requires knowledge of the membership function of the quantity subject to the uncertainty which could be very subjective. Furthermore, in majority of the methods, uncertainty of the model output is mainly attributed to uncertainty in the model parameters. For instance, Monte Carlo (MC) based methods analyse the propagation of uncertainty of parameters (measured by the probability density function, pdf) to the pdf of the output. Similar types of analysis are performed for the input or structural uncertainty independently. Methods based on the analysis of the model errors typically compute the uncertainty of the “optimal model” (i.e. the model with the calibrated parameters and the fixed structure), and not of the “class of models” (i.e. a group of models with the same structure but parameterised differently) as, for example, MC methods do.

The contribution of various sources of errors to the total model error is typically not known and, as pointed out by Gupta et al. (2005), disaggregation of errors into their source components is often difficult, particularly in hydrology where models are non-linear and different sources of errors may interact to produce the measured deviation. Nevertheless, evaluating the contribution of different sources of uncertainty to the overall uncertainties in model prediction is important, for instance, for understanding where the greatest sources of uncertainties reside, and, therefore directing efforts towards these sources (Brown and Heuvelink, 2005). In general, relatively few studies have been conducted to investigate the interaction between different sources of uncertainty and their contributions to the total model uncertainty (Engeland et al., 2005; Gupta et al., 2005). For the risk based decision-making process such as flood warnings, it is more important to know the total model uncertainty accounting for all sources of uncertainty than the uncertainty resulting from individual sources.

However, the practice of uncertainty analysis and use of the results of such analysis in decision making is not widespread, for several reasons (Pappenberger and Beven, 2006). Uncertainty analysis takes time, so adds to the cost of risk analysis, options appraisal and design studies. It is not always clear how uncertainty analysis will contribute to improved decision making. Much of the academic literature on hydrological uncertainties (Liu and Gupta, 2007) has tended to focus upon forecasting problems. Identifying uncertainty bounds on a flood forecast is important, but to be meaningful needs to be set within the context of a well defined decision problem (Frieser et al., 2005; Todini, 2008). The uncertainty analysis requires careful interpretation in order to understand the meaning and significance of the results. It is through this process of scrutiny and discussion that the most useful insights for decision makers are obtained. Furthermore, the conduct of uncertainty analysis provides new insights into model behaviour that will need to be discussed and agreed with the experts responsible for models that are input into the analysis. Indeed the process of scrutiny that uncertainty analysis provides is an additional benefit (Hall and Solomatine, 2008).

Experience is now growing in the communication of uncertainty to decision makers and members of the public, for example in the context of environmental risks (Sluijs et al., 2003) and climate change (IPCC, 2005). Sluijs et al. (2003) stress the importance of engaging stakeholders from this early stage, identifying the target audiences and then using appropriate language to communicate uncertainties. Alongside numerical results and their implications for decision makers, the limitations of data sources and analysis methods should be made clear and areas of ignorance should be highlighted.

Machine learning in uncertainty analysis

Over the last 15 years many machine learning techniques have been used extensively in the field of rainfall-runoff modelling (see section 3.4 for more detail). These techniques have been also used to improve the accuracy of prediction/forecasting made by process based rainfall-runoff models. Generally, they are used to update the output variables by forecasting the error of the process based models. All these techniques to update the model predictions can be seen as error modelling paradigm to reduce the uncertainty of the predictions. However, these techniques do not provide explicitly the uncertainty of the model prediction in the form of prediction bounds or probability distribution function of the model output.

In this thesis we explore the possibility of using machine learning techniques which can provide the reasonable uncertainty estimation for the runoff prediction made by rainfall-runoff models.

As mentioned above in section 1.1, with advances in computational power and technological development, more complex and sophisticated distributed rainfall-runoff models have been built and used in practice. Computational burden in distributed runoff models is now less problematic than before, although it still can be an issue when predictive uncertainty of a model is assessed through laborious MC simulations (Beven, 2001). Several uncertainty analysis methods based on MC simulations have been developed to propagate the uncertainty through the models. The MC based method for uncertainty analysis of the outputs of such models is straightforward, but becomes impractical in real time applications for computationally intensive complex models when there is insufficient time to perform the uncertainty analysis because the large number of model runs is required. Practical implementation of MC based uncertainty analysis methods face two major problems: (i) convergence of the MC simulations is very slow (with the order of computational complexity (O) of ), so a large number of runs needed to establish a reliable estimate of uncertainties; and (ii) the number of simulations increases exponentially with the dimension of the parameter vector (O(np)) to cover the entire parameter domain, where s is the number of simulations, p is the dimension of parameter vector, n is the number of samples required for each parameter.

A number of research have been conducted to improve the efficiency of MC based uncertainty analysis methods such as Latin hypercube sampling (McKay et al., 1979), and the moment propagation techniques (Rosenblueth, 1975; Harr, 1989; Melching, 1992). However all these methods require running the model many times in both offline and online mode. In other words, MC based methods require running the models in a loop each time when the uncertainty of the model prediction for the new input data x(T+1) is required. In this thesis we explore an efficient method to assess the uncertainty of the model M for t = T+1 when new input data x(T+1) is feed. The method we propose encapsulates the MC based uncertainty results in machine learning models and is referred to as a Machine Learning in parameter Uncertainty Estimation” (MLUE). In the MLUE method, the machine learning model is used as a surrogate model to emulate the laborious MC based uncertainty methods and hence provides an approximate solution to the uncertainty analysis in a real time application without re-running the MC runs. Surrogate modelling is the process of constructing approximation models (emulators) that mimic the behavior of the simulation model as closely as possible while being computationally cheap(er) to run. We believe that it is preferable to have an approximate uncertainty estimate than no uncertainty estimate at all.

Yet another problem relates to the situation when an interest is in assessing model uncertainty when it is difficult to attribute it to any particular source. If the data and resources are available and the computational time allows to do a full MC based uncertainty analysis method, then it is preferable to perform the latter. However in practice, engineering decisions are often based on a single (optimal) model run without any uncertainty analysis. In this thesis we also develop a novel method for uncertainty analysis of a calibrated model based on the historical model residuals. The historical model residuals (errors) between the model prediction and the observed data are the best available quantitative indicators of the discrepancy between the model and the real-world system or process, and they provide valuable information that can be used to assess the predictive uncertainty. The residuals and their distribution are often functions of the model input variables and can be predicted by building a separate model mapping of the input space to the model residuals or even their probability distribution function. In other words, the idea here is to learn the relationship between the probability distribution of the model residuals and the input variables; and to use this relationship to predict the uncertainty of the model prediction of the output variable (e.g., runoff) in the future. This approach is referred to as an ‘‘UNcertainty Estimation based on local Errors and Clustering’’ (UNEEC). The UNEEC method estimates the uncertainty of the optimal model that takes into account all sources of errors without attempting to disaggregate the contribution given by their individual sources. The UNEEC method is based on the concept of optimality instead of equifinality as it analyzes the historical model residuals resulting from the optimal model (both in structure and parameter).

Objective of this study

The aim of this research is to develop methodology for uncertainty analysis in rainfall-runoff modelling using machine learning techniques. The objectives of the research are:

1.       To review the existing methods of uncertainty analysis in rainfall-runoff modelling;

2.       To review machine learning methods and to investigate the possibility of applying machine learning methods in uncertainty analysis;

3.       To develop a methodology for uncertainty analysis in rainfall-runoff modeling using machine learning methods;

4.        To develop a methodology for the surrogate modeling of uncertainty generated by the Monte Carlo based uncertainty methods; and

5.       To implement the developed methodologies in computer codes and to test the methodologies by application to real-world problems.

Outline of the thesis

The thesis is organised in eight chapters. A brief overview of the structure is given below. Chapter 2 is devoted to a review of uncertainty analysis especially in rainfall-runoff modelling. It starts with brief overviews of rainfall-runoff models and their classification which is followed by the sources of uncertainty in the context of rainfall-runoff modelling. It also discusses the commonly used uncertainty representation theories of probability, fuzzy logic and entropy. It briefly reviews the various uncertainty analysis methods used in rainfall-runoff modelling.

Chapter 3 presents several machine learning techniques used in this study. It describes artificial neural networks, model trees, instance based learning and clustering techniques.

In Chapter 4, a novel method “Machine Learning in parameter Uncertainty Estimation” (MLUE) to modelling parametric uncertainty of rainfall-runoff models is presented. It is observed that there exists a dependency between the forcing input data, the state variables of rainfall-runoff models and the uncertainty of the model predictions. Chapter 4 explores building machine learning models to approximate the functional relationship between the input data (including state variables if any) and the uncertainty of the model prediction such as a quantile.

Chapter 5 presents the application of the MLUE method for parametric uncertainty representation and analysis. Various machine learning models such as artificial neural networks, model trees, locally weighted regression, have been used. The MLUE method is applied to analyse the uncertainty of a lumped conceptual rainfall-runoff model of the Brue catchment in the UK.

Chapter 6 presents a novel method “Uncertainty Estimation based on Local Error and Clustering” (UNEEC) for uncertainty analysis of rainfall-runoff models. This method assumes that the model residuals or errors are indicators of the total model uncertainty. The method estimates the “residual uncertainty” of the optimal model that takes into account all sources of errors without attempting to disaggregate the contribution given by their individual sources.

Chapter 7 provides the application of the UNEEC methodology to estimate uncertainty of rainfall-runoff models in a number of catchments. In the first part the UNEEC method is applied to estimate uncertainty of the forecasts made by several machine learning methods in the Sieve catchment in Italy. The second part covers the application to estimate uncertainty of the conceptual rainfall-runoff models of two catchments: the Brue in UK and the Bagmati in Nepal. The comparison results with other uncertainty methods are presented as well.

Chapter 8 presents the conclusions of the research based on the various case studies presented in this thesis. Finally the possible directions for further research are suggested.