Conclusions and Recommendations
This research explores machine learning techniques for uncertainty analysis and prediction, particularly in rainfall-runoff modelling. This chapter presents conclusions drawn from this research and recommendations for future research and development for uncertainty analysis in the field of rainfall-runoff modelling. View as Pdf file from here. Rainfall-runoff modelling and uncertainty analysisRainfall-runoff models are widely used in hydrology for a large range of applications and play an important role in optimal planning and management of water resources in river basins. By definition, a rainfall-runoff model is only an abstraction of a complex, non-linear, time and space varying hydrological process of reality; there are many simplifications and idealisations. These models contain parameters that cannot often be measured directly, but can only be estimated by calibration against a historical record of measured output data. The system input (forcing) data such as rainfall, temperature, etc. and observed output are often contaminated by measurement errors. Consequently predictions made by such a model are far from being perfect and uncertain, no matter how sophisticated the models are and how perfectly the models are calibrated. It is vital, therefore, that uncertainty should be recognized and properly accounted for. Once the existence of uncertainty in a rainfall-runoff model is acknowledged, it should be managed by a proper uncertainty analysis and prediction procedures aimed eventually at reducing its impact. There a number of such procedures actively used but our analysis makes it possible to conclude that they are often based on strong assumptions and suffer from certain deficiencies. This thesis is devoted to developing new procedures for uncertainty analysis and prediction and testing them on various case studies. It should be noted that still the practice of uncertainty analysis and use of the results of such analysis in decision making is not widespread (Pappenberger and Beven, 2006). It is not always clear for practitioners how uncertainty analysis will contribute to improved decision making. The uncertainty analysis requires careful interpretation in order to understand the meaning and significance of the results. It is through this process of scrutiny and discussion that the most useful insights for decision makers are obtained (Hall and Solomatine, 2008). Uncertainty analysis methodsThis thesis investigates a number of methods proposed in the literature to provide meaningful uncertainty bounds of the model predictions. The uncertainty analysis methods in rainfall-runoff models vary mainly in the following ways: (i) the type of rainfall-runoff models used; (ii) the source of uncertainty to be treated; (iii) the representation of uncertainty; (iv) the purpose of the uncertainty analysis; and (v) the availability of resources. Uncertainty analysis is a well-accepted procedure and has comparatively long history in physically based and conceptual modelling. Uncertainty analysis methods can be broadly classified into six categories (see, e.g., Montanari, 2007; Shrestha and Solomatine, 2008): (i) analytical methods; (ii) approximation methods; (iii) simulation and sampling-based methods; (iv) Bayesian methods; (v) methods based on the analysis of model errors; and (vi) fuzzy set theory-based methods. Most of the existing methods analyse the uncertainty of the uncertain input variables by propagating it through the deterministic model to the outputs, and hence require the assumption of their distributions and error structures. Most of the approaches based on the analysis of the model errors require certain assumptions regarding the residuals (e.g., normality and homoscedasticity). Obviously, the relevancy and accuracy of such approaches depend on the validity of these assumptions. The fuzzy theory-based approach requires knowledge of the membership function of the quantity subject to the uncertainty, which could be very subjective. Furthermore, the majority of the uncertainty methods account for only a single source of uncertainty and ignore other sources of uncertainty explicitly. Methods based on the analysis of the model errors typically compute the uncertainty of the “optimal model” that takes into account all sources of errors without attempting to disaggregate the contribution given by their individual sources. No single method of uncertainty estimation can be claimed as being perfect in representing uncertainty. Our analysis allows us to conclude that the machine learning methods which are able to build accurate models based on data (in this case, on the data about the past model errors) have excellent potential for their use as uncertainty predictors. Machine learning methods for uncertainty analysisOver the last 15 years many machine learning techniques have been used to build data-driven rainfall-runoff models. There are also examples of applying these techniques as error correctors to improve the accuracy of prediction/forecasting made by process based rainfall-runoff models. Generally they are used to update the output variables by forecasting the error of the process based models. Such techniques to update the model predictions are in fact reducing the uncertainty of predictions. However these techniques do not provide explicitly the uncertainty of the model prediction in the form of prediction bounds or probability distribution function of the model output. In this thesis we explore the possibility of using machine learning techniques to provide reasonable uncertainty estimation for the model output predicted by data-driven or process based models. We develop two methods, namely the MLUE for parametric and the UNEEC method for residual uncertainty analysis of rainfall-runoff models. A method for parameter uncertainty analysisMonte Carlo (MC) simulation is a widely used method for uncertainty analysis in rainfall-runoff modeling and allows the quantification of the model output uncertainty resulting from uncertain model parameters. It involves random sampling from the distribution of uncertain input and successive model runs until a desired statistically significant distribution of outputs is obtained. The MC based methods for uncertainty analysis of the outputs of the process models are flexible, robust, conceptually simple and straightforward; however methods of this type require a large number of samples (or model runs), and their applicability is sometimes limited to simple models. In the case of computationally intensive models, the time and resources required by these methods could be prohibitively expensive. A number of methods have been developed to improve the efficiency of MC based uncertainty analysis methods and still these methods require a considerable number of model runs in both offline and operational mode to produce a reliable and meaningful uncertainty estimation. In this thesis we develop a method to predict parametric uncertainty of rainfall-runoff model by building machine learning models that emulate the MC uncertainty results. The proposed method is referred to as the MLUE (Machine Learning in parameter Uncertainty Estimation). The motivation to develop MLUE method is to perform fast parameter uncertainty analysis and prediction. We assume that the uncertainty of the model prediction at particular time step depends on the corresponding forcing input data and the model states (e.g., rainfall, antecedent rainfall, soil moisture etc.). We believe that uncertainty associated with prediction of hydrological variables such as runoff in similar hydrological conditions is also similar. The MLUE method emulates the MC simulations and belongs to the class of surrogate, or meta-models. The novelty and the characteristics of the methodology are: 1. The method explicitly builds an emulator for the MC uncertainty results while other methods build an emulator for a single simulation model; 2. The MLUE emulator is based on machine learning techniques, while other techniques are Bayesian (e.g., O’Hagan, 2006), or use nonlinear differential equations (e.g., data based mechanistic model of Young (1998)); 3. The method is computationally efficient and does not involve any additional runs of the process model; it can therefore be easily applied to computationally demanding process models; 4. The method can be applied easily to any Monte Carlo based uncertainty analysis methods. The MLUE method is applied to a conceptual rainfall-runoff model for the Brue catchment in UK. The generalised likelihood uncertainty estimation method (GLUE) has been used to analyse the parameter uncertainty of the model. Machine learning methods have been applied to estimate the uncertainty results (e.g., quantile or prediction intervals) generated by the GLUE method. We have shown how domain knowledge and analytical techniques are used to select the input data for the machine learning models used in the MLUE method. Three machine learning models, namely artificial neural networks, model trees, and locally weighted regression, are used to predict the uncertainty of the model predictions. The performance of the MLUE method is measured by its predictive capability (e.g., coefficient of correlation and root mean squared error) and the statistics of the uncertainty (e.g., the prediction intervals coverage probability and the mean prediction intervals). It is demonstrated that machine learning methods can predict the uncertainty results with reasonable accuracy. The great advantage of the MLUE method is that once the machine learning models are developed, which is done offline, it can predict the uncertainty of the model output in a fraction of second which otherwise would take several hours or day of computation time by the MC based uncertainty analysis methods. The proposed techniques could be useful in real time applications when it is impracticable to run a large number of simulations for complex hydrological models for an uncertainty analysis and when the forecast lead time is very short. A method for residual uncertainty analysisAnalysis of research literature has shown that the assessment of model uncertainty of the optimal (calibrated) rainfall-runoff models has received relatively little attention. Most research typically focuses on one single source of uncertainty and the majority of the studies are oriented toward parametric uncertainty. There are many situations, however, when the contribution of the parameter uncertainty to the total uncertainty is smaller compared to the other types, for instance input (rainfall) uncertainty or structure uncertainty. The consequence of considering only parametric uncertainty is that the predictive uncertainty bounds estimated are too narrow and thus a considerable part of the observed data fall outside these bounds. Furthermore, disaggregation of the total model uncertainty into its source components is difficult, particularly in cases common to hydrology where the model is non-linear and complex, and different sources of uncertainty may interact. Generally the analysis of uncertainty consists of propagating the
uncertainty of the input and parameters (which is measured by distribution
function) through the model by running it for sufficient number of times and
deriving the distribution function of the model outputs. In this thesis, we
present a different approach to analyzing the uncertainty. We focus on residual uncertainty which is defined as the remaining uncertainty
of the optimal model. Here the model optimality is understood in the following sense: the
model is calibrated, so that the model parameters and structure are such that
the model error is at minimum. However, even such a model simulates or predicts
the output variable with errors, so its output contains uncertainty. We develop a novel methodology to estimate the uncertainty of the optimal model output by analyzing historical model residuals errors. This method is referred to as UNcertainty Estimation based on Local Errors and Clustering (UNEEC). The characteristics of the methodology are: 1. Residuals are used to characterize the uncertainty of the model prediction; 2. No assumptions are made about the probability distribution function of the model residuals; 3. The method uses the concept of the model optimality and does not involve any additional runs of the process model; 4. Specialized uncertainty models are built for particular areas of the state space (e.g., hydrometeorological condition). Clustering is needed to identify these areas; 5. The uncertainty models are built using machine learning techniques; 6. The method is computationally efficient and can therefore be easily applied to computationally demanding process models; 7. The method can be applied easily to any existing model no matter whether it is physically based or conceptual or data-driven. The UNEEC method consists of the three main steps: (i) clustering the input data in order to identify the homogenous regions of the input space; (ii) estimating the probability distribution of the model residuals for the regions identified by clustering; and (iii) building the machine learning models of the probability distributions of the model error. Fuzzy clustering has been used to cluster the input data. Three machine learning models, namely artificial neural networks, model trees, locally weighted regression, are used to predict the uncertainty of the model predictions. We apply domain knowledge and analytical techniques to select the input data for the machine learning models used in the UNEEC method. The UNEEC method is applied to rainfall-runoff models for three contrasting catchments: (i) data-driven models of the Sieve catchment, Italy; (ii) lumped process based model of the Brue catchment, UK; and (iii) lumped process based model of the Bagmati catchment, Nepal. The performance of the UNEEC method is measured by two uncertainty statistics namely the prediction intervals coverage probability and the mean prediction intervals. It has been demonstrated that uncertainty bounds estimated by the UNEEC method are consistent with the order of the magnitude of the model errors. The results show that the percentage of the observed discharge falling within the estimated uncertainty or prediction bounds is very close to the specified confidence level used to produce these bounds. In other words, the PICP values are consistently close to the desired degree of the confidence levels used to derive the prediction bounds with the reasonable width of the bounds. We also compare the uncertainty of the model prediction with other uncertainty estimation methods, namely generalised likelihood uncertainty estimation (GLUE), meta-Gaussian, quantile regression. The comparison results show that the UNEEC method generates the consistent and interpretable uncertainty estimates, and this is an indicator that it can be a valuable tool for assessing uncertainty of various predictive models. Multiobjective calibration and uncertaintyPractical experience with the calibration of the rainfall-runoff model suggests that a single objective function value is often inadequate to measure properly the simulation of all the important characteristics of the system that are reflected in the observations. Furthermore, recent advances in computational power and the increased availability of distributed hydrological observations have led to more complex hydrological model, often predicting multiple hydrological fluxes simultaneously. Therefore, there is an increasing interest in multiobjective calibration of rainfall-runoff model parameters. In this research, we have applied multiobjective optimisation routine NSGA-II to calibrate the HBV rainfall-runoff model for the Bagmati catchment in Nepal. Four objective functions that emphasise different aspects of the runoff hydrograph are the volume error, root mean squared error (RMSE), RMSE of low flows, and RMSE of peak flows. We have implemented Pareto preference ordering in order to select a small number of solutions from the four dimensional Pareto-optimal solutions. The preferred Pareto-optimal set is compared with the optimal values of the parameter set for each of the single objective functions. It is observed that the results of the preferred Pareto-optimal set is a good compromise between all four objective functions and are within the best and worst solutions from the single objective function optimisations. Multiobjective calibration also allows the quantification of the uncertainty in a form of the range of the model simulations corresponding to the Pareto-optimal sets of the parameter; and it provides additional information necessary for risk based decision making. Summary of conclusionsThe aim of this research is to develop tools and techniques for analysing, modelling and predicting uncertainty in rainfall-runoff modelling. This research explores machine learning techniques for uncertainty analysis and prediction, particularly in rainfall-runoff modelling. We develop two methods, namely the MLUE for parametric and the UNEEC method for residual uncertainty analysis of rainfall-runoff models. We apply these methods on various case studies to analyse, model and predict uncertainty in the model predictions. This research has demonstrated that machine learning methods are able to build reasonably accurate and efficient models for predicting uncertainty. We conclude that the machine learning techniques can be valuable tools for uncertainty analysis of various predictive models. Recommendations· In this research the MLUE method has been used to emulate the results of GLUE method. MLUE can be also applied to other MC based uncertainty analysis methods such as Markov chain Monte Carlo sampling, Latin hypercube sampling etc. Furthermore, in the MLUE method, we consider only parameter uncertainty of the process model and ignore other sources of uncertainty explicitly. The MLUE method, in principle can be also applied to other sources of uncertainty – input and structure uncertainty individually or combination of two or more sources of uncertainty. · The UNEEC method relies on the concept of model optimality instead of equifinality. If the assumption of model optimality i.e. the existence of a single ‘‘best’’ model is not valid, then all of the models that are considered ‘‘good’’ should be considered, as is done when the concept of equifinality is adopted. This can be achieved by combining such models in an ensemble, or by generating the meta-models of uncertainty for each possible combination of the model structure and parameter set, or even involving the uncertainty associated with the input data. Consequently, instead of having a single set of uncertainty bounds for each model prediction, there will be a set of such bounds generated for each member of an ensemble. The use of such several uncertainty bounds in the decision making process would be really challenging. · In the UNEEC method we analyse historical model residuals resulting from single objective calibration of rainfall-runoff model using Nash-Sutcliffe model efficiency criterion. It would be important to investigate the sensitivity of the uncertainty results with other objective functions (e.g., volume error, RMSE of low flows or RMSE of high flows) used in the optimization algorithm. Another interesting research is to apply the UNEEC method for multiobjective calibration results and produce the uncertainty bounds for each Pareto-optimal solution – this will be analogous to using the equifinality principle. · We consider two types of rainfall-runoff models – data-driven and conceptual (HBV). It is recommended to test the proposed uncertainty analysis methods to physically based and other conceptual rainfall-runoff models in various case studies. Furthermore, the methodologies have considerable potential in application to other mathematical models of water based systems. · As a machine learning method we have used ANN, model trees and locally weighted regression methods. Further studies should be aimed at testing other machine learning techniques including support vector machines, other instance based learning methods, etc. · In multiobjective calibration we used Pareto preference ordering to select a small number of solutions from more than 400 Pareto-optimal solutions in four dimensions. The procedure to select a small number of preferred solutions from the Pareto-optimal solutions in this research consists of two independent steps – firstly generating the Pareto-optimal solutions using multiobjective optimisation algorithm (such as NSGA-II) and secondly using preference ordering. It would be worthwhile to apply a multiobjective optimisation algorithm based on the preference ordering that is supposed to be more effective in achieving a better grading of a set of solutions to a problem that consists of many objective functions. · Although the concept of model uncertainty is well recognized in the research community, the practice of uncertainty analysis and use of the results of such analysis in the decision making is not widespread. It is not always clear how uncertainty analysis will contribute to improving decision making. One of the most important challenges is to communicate effectively to the decision maker, the insights and the advantages an analysis of uncertainty provides and to convert uncertain results to into simple message understood by the general public. The further research should be fostered into incorporating the uncertainty analysis results in risk based decision making. |