Hydrology Session Details

Tuesday, August 10, 9:30-12:30

(All times for the workshop are listed in Central Time, UTC -5)

YOUTUBE LINKS: Please go to the KGML YouTube Channel for all available recorded presentations.

Session organizers: Chris Duffy, Xiang Li, and John Nieber

SPEAKERS:

11:00-11:05 BREAK

Session introduction

Abstract: This session explores applications of Knowledge Guided Machine Learning (KGML) to a range of problems in hydrology, water resources and water quality studies. The overall goal is to bring together data- and domain scientists such that innovative methods of deep learning from data science and physical process understanding from domain scientists can together make progress on critical environmental problems of the day. The presentations include: i) process identification of the physical behavior and mechanisms of extreme flooding events using LSTM networks. ii) Embedding Deep Learning sub-models multi- physics environmental models where DL parameterizations can outperform physics-based and computationally expensive sub-components (e.g. turbulent heat fluxes, 3D groundwater water quality modeling). iii) Investigating the transferability of static and dynamic source characteristics in continental lake and catchment research. iv) Explainable AI where data science leads to hydrologic signatures with plausible scientific justification for resource management and planning. v) An extreme gradient boosting machine learning model for mapping nitrate concentrations in groundwater across the conterminous United States. vi) ML-based data assimilation strategies for reduced order models in groundwater and coastal applications. A poster session will follow the talks.

Bio: Christopher J. Duffy is an Emeritus Professor in the Civil and Environmental Engineering Department of Penn State University. He has held appointments at Los Alamos National lab (1998-99), Cornell University (1987-88), Ecole Polytechnic Lausanne (2006-07), Smithsonian Institution, University Bristol, UK (2014-2016) University of Bonn, DE (2015). Duffy and his team focus on developing spatially-distributed, physics-based computational models for multi-scale, multi-process water resources applications (http://www.pihm.psu.edu/), supported by automated data services (www.hydroterre.psu.edu). Recent research as PI/Co-PI include: NSF Critical Zone Observatory, NSF INSPIRE, NSF EarthCube, EPA, CNH, DARPA World Modelers and NSF HDR .

Bio: John Nieber is a native of Upstate New York and he received his B.S. degree in Forest Engineering at Syracuse University in 1972, his M.S. degree in Civil and Environmental Engineering at Cornell University in 1974, and his Ph.D. in Agricultural Engineering at Cornell University in 1979. He joined the Department of Agricultural Engineering at Texas A&M University as an Assistant Professor in 1979. He left Texas A&M in 1985 to join the Department of Agricultural Engineering at the University of Minnesota as an Associate Professor, and in 1995 he was promoted to Full Professor at Minnesota in the department which is now called Bioproducts and Biosystems Engineering. John’s research interests involve hydrologic process discovery and modeling, with particular interest in flow and transport processes in porous media. He is a member of the American Society of Agricultural and Biological Engineers, American Geophysical Union, Soil Science Society of America, and a certified Professional Hydrologist with the American Institute of Hydrology. Current research involves studies on utilizing infiltration for stormwater control in urban areas, assessing best management practices impacts on reduction of nitrate in groundwater, combining hydrologic models with machine learning, and quantifying mass and energy transport processes in urban ecosystems.


"Uncovering flooding mechanisms through interpretive deep learning"

Abstract: Long short-term memory (LSTM) networks represent one of the most prevalent deep learning (DL) architectures in current hydrological modeling, but they remain black boxes from which process understanding can hardly be obtained. This study aims to demonstrate the potential of interpretive DL in gaining scientific insights using flood prediction across the contiguous United States (CONUS) as a case study. Two interpretation methods were adopted to decipher the machine-captured patterns and inner workings of LSTM networks. The DL interpretation by the expected gradients method revealed three distinct input-output relationships learned by LSTM-based runoff models in 160 individual catchments. These relationships correspond to three flood-inducing mechanisms—snowmelt, recent rainfall, and antecedent wetness—that account for 8.3%, 59.5%, and 32.2% of the 36,763 flood peaks identified from the dataset, respectively. Single flooding mechanisms dominate 83.1% of the investigated catchments (10.6% snowmelt-dominated, 45.0% recent rainfall-dominated, and 27.5% antecedent wetness-dominated), and the remaining 16.9% have mixed mechanisms. The spatial variability in the dominant mechanisms reflects the catchments’ geographic and climatic conditions. Moreover, the additive decomposition method unveils how the LSTM network behaves differently in retaining and discarding information when emulating different types of floods. Information from inputs within previous time steps can be partially stored in the memory of LSTM networks to predict snowmelt-induced and antecedent wetness-induced floods, while for recent rainfall-induced floods, only recent information is retained. Overall, this study provides a new perspective for understanding hydrological processes and extremes and demonstrates the prospect of artificial intelligence (AI)-assisted scientific discovery in the future.

Bio: Prof. Dr. Yi Zheng received his Ph.D. from University of California, Santa Barbara (2007). He is the Associate Dean of the School of Environmental Science and Engineering at Southern University of Science and Technology (SUSTech), China. Before he joined SUSTech in 2016, he was an associate professor at Peking University. His research interests include hydrologic modeling, water resources management and environmental big data. His major scientific contributions cover integrated ecohydrological modeling, uncertainty analysis for complex environmental models, human-water nexus, and artificial intelligence for hydrology. He is currently an Associated Editor of Water Resources Research and also an Associated Editor of Journal of Hydrologic Engineering-ASCE.

Embedding neural networks to simulate turbulent heat fluxes in a process-based hydrologic modeling framework

Abstract: Deep learning (DL) methods have shown great promise for accurately predicting hydrologic processes but have not yet reached the complexity of traditional process-based hydrologic models (PBHM). While DL methods have been able to achieve superior predictive performance for specific tasks, the ability of PBHMs to simulate the entire hydrologic cycle makes them useful for a wide range of modeling and simulation tasks. We take advantage of both of these approaches by coupling a DL model into a PBHM as sub-components for the simulation of latent and sensible heat fluxes. In this talk we will describe the workflow and technologies needed to perform this coupling, as well as provide an outlook for the future of such applications. Our results demonstrate that the DL parameterizations can outperform physics-based equations for turbulent heat fluxes in several ways. We show that the DL parameterizations show improvements in predictive performance as well as provide more realistic simulations of other aspects which were not directly trained for by taking advantage of information from other model components. We then explore how the neural networks were able to more accurately simulate evaporation and heat transfer by analyzing the network with a method known as layerwise relevance propagation (LRP). We show how the neural networks were able to learn physically realistic relationships between input and output, as well as learning general site characteristics which were not included in the training data. Our work demonstrates how combining modeling approaches can lead to better predictions and hints at how such methods may allow us to derive better scientific understanding directly from large datasets.

Bio: Andrew Bennett is a research scientist in the Computational Hydrology research group at the University of Washington. He received his PhD in civil and environmental engineering from the University of Washington in 2021 and was previously a research assistant in the Computer Science and Mathematics Division at Oak Ridge National Laboratory. Andrew's research focuses on hydrologic model development and using machine learning to improve our understanding of hydrologic systems.

"Source Aware Modulation for leveraging limited data from heterogeneous sources"

Abstract: In many personalized prediction applications, sharing information between entities/tasks/sources is critical to address data scarcity. Furthermore, inherent characteristics of sources distinguish relationships between input drivers and response variables across entities. For example, for the same amount of rainfall (input driver), two different basins will have very different streamflow (response variable) values depending on the basin characteristics (e.g., soil porosity, slope, …). Given such heterogeneity, a trivial merging of data without source characteristics would lead to poor personalized predictions. In recent years, meta-learning has become a very popular framework to learn generalized global models that can be easily adapted (fine-tuned) for individual sources. In this talk, we present an exhaustive analysis of the source-aware modulation based meta-learning approach. Source-aware modulation adjusts the shared hidden features based on source characteristics. The adjusted hidden features are then used to calculate the response variable for individual sources. Although this strategy shows promising prediction improvement, its applicability is limited in certain applications where source characteristics might not be available (especially due to privacy concerns). In this work, we show that robust personalized predictions can be achieved even in the absence of explicit source characteristics. We investigated the performance of different modulation strategies under various data sparsity settings on two datasets. We demonstrate that source-aware modulation is a very viable solution (with or without known characteristics) compared to traditional meta-learning methods such as model agnostic meta-learning.

Xiang Li Bio: Xiang Li is a current PhD candidate in the Water Resources Science program at University of Minnesota. His background is interdisciplinary. He earned his master degree of Water resources science and declared a minor in computer science (machine learning and data mining in particular) at UMN. His master thesis is about baseflow recession analysis and groundwater storage change analysis. His current work focuses on integrating machine learning algorithms into hydrology modeling, including SWAT models and classic baseflow recession analysis.

Ankush Khandelwal Bio: Ankush Khandelwal is a Research Associate in the Computer Science Department at University of Minnesota. He has a PhD in Computer Science from University of Minnesota. Khandelwal's research has been focused on developing novel machine learning algorithms to analyze vast amounts of satellite imagery for different earth science domains such as water, agriculture and forestry. His current research work is focused on developing physics aware machine learning algorithms for hydrological applications.

"Explainable machine learning: A peek into black-box models"

Abstract: The unmatched performance of machine learning models increased their popularity in fields that were previously dominated by knowledge-based modeling. Because machine learning models can be built on complex designs, they are frequently referred to as black-box models. However, when decisions can impact the lives of others, we better make sure we can explain why these decisions are made. Besides, being able to explain how these models work can increase our domain knowledge and confidence in their usage.

This presentation has two main objectives: (i) outline common methods used to improve model interpretability, and (ii) present a case study where these methods are applied. The case study concerns the use of a machine learning model to predict a hydrological signature from catchment attributes. Explainable machine learning methods are then used to improve the model interpretability and the outcomes compared to previous hydrological knowledge. The presentation contains R code chunks of how these methods are applied.

Bio: Daniel Althoff is a Postdoctoral Researcher at the Department of Physical Geography at Stockholm University. He received his Ph.D. degree in agricultural engineering from the Federal University of Viçosa, Brazil, in 2021. His Ph.D. research focused on hydrological modeling for water availability in the Brazilian savanna biome (Cerrado). His current research at Stockholm University focuses on machine learning modeling of hydroclimatic systems.

"Process-informed machine learning predictions of nitrate in groundwater used for drinking supply in the conterminous United States"

Abstract: An extreme gradient boosting (XGB) machine learning model was developed to map the three-dimensional distribution of nitrate concentrations in groundwater across the conterminous United States (CONUS). The model used observations at 12,082 domestic and public supply wells, which represent the drinking water supply zone for the CONUS. Process-based predictor variables used in modeling consisted of two types: physics based, such as output from a national groundwater flow model, and empirical, such as multi-order hydrologic position. Nitrate was predicted at a 1 km resolution for two depth zones that vary across the CONUS: one for domestic and one for public-supply wells. The model provided accurate estimates of the cumulative distribution of nitrate concentrations, including the proportion of high values (>10 mg/L), for both the training and hold-out data at regional and national scales. Most of the CONUS had concentrations below 1 mg/L, and the most influential explanatory factors, based on SHapley Additive exPlanations (SHAP), were well depth and depth to a wet soil layer which represent surrogates for hydrologic and nitrate transport processes. The national groundwater flow model and multi-order hydrologic position variables were also important. Only 1% of the area in either depth zone had predicted high nitrate concentrations, but about 1.4M equivalent people depend on groundwater for drinking water in those areas. Predicted high concentrations were most prevalent in the central CONUS. This work represents the first application of XGB to a three-dimensional national-scale groundwater quality model and provides a significant milestone in the efforts to document nitrate in groundwater across the CONUS.

Bio: Katherine is a hydrologist with the U.S. Geological Survey, California Water Science Center, Sacramento, CA where she currently works for the National Water Quality Assessment program. Her recent work involves using ensemble tree methods to assess groundwater quality across the United States. She completed her doctoral studies at the University of California, Davis, with Dr. Thomas Harter, where she focused on machine learning and Bayesian statistical modeling of nitrate in groundwater of the Central Valley, California. When she isn’t doing statistical modeling, she enjoys spending time with her family.

"ML-based scalable data assimilation with hydrological applications"

Abstract: Estimation of unknown hydrodynamics parameters such as coastal bathymetry and subsurface permeability from indirect hydrological and geophysical measurements is usually an ill-posed inverse problem and can be computationally challenging for high-dimensional and big data applications as most of the inversion techniques require iterative calls to complex multiphysics models to compute Jacobians and subsequent inversion of large dense matrices. In this talk, deep learning approaches are utilized to accelerate inverse modeling, data assimilation, and uncertainty quantification for geoscience parameter estimation and subsequent forecast. Reduced order models of PDEs are constructed on a nonlinear manifold of low dimensionality through deep generative models so that uncertainty quantification can be performed on the low dimensional latent space in a Bayesian framework. Combined with automatic differentiation and stochastic Newton-type MCMC methods, it is shown that deep learning-based methods can perform the inversion operation much faster than traditional inversion methods, which offers great promise in various geoscience applications. The improvement and performance of these methodologies are illustrated by applying them to subsurface permeability characterization and riverine/coastal waterbody depth imaging problems..

Bio: Jonghyun Harry Lee is an assistant professor jointly appointed in the department of Civil and Environmental Engineering and Water Resources Research Center at the University of Hawaii at Manoa. Lee has worked on the integration of machine learning, multi-physics simulation modeling, parameter estimation, and stochastic optimization for reliable water resources management with public domain software developments. He has been an Oak Ridge Institute for Science and Education (ORISE) Faculty Fellow since 2018 and serves as an Associate Editor in Journal of Hydrology.