Hydrology Session Details

Wednesday, August 19, 9:30-12:30 CDT

YOUTUBE LINKS: Part 1(start to break), Part 2 (break to session end)

Session chairs: John Nieber and Chris Duffy

SPEAKERS:

11:05-11:15 BREAK

“What is the Role of Hydrological Science in the Age of Machine Learning?”

Abstract: Recent experiments applying deep learning to rainfall-runoff simulation indicate that there is significantly more information in large-scale hydrological data sets than hydrologists have been able to translate into theory or models. We argue that these results challenge certain `sacred cows' in the surface hydrology community, and may be a bellwether for the discipline as a whole. While there is growing interest in machine learning in the hydrological sciences community, in many ways our community still holds deeply subjective and non-evidence-based preferences for process understanding that has historically not translated into accurate theory, models, or predictions. The objective of this opinion piece is to suggest that, due to this failure in the surface hydrology community to develop scale-relevant theories, one possible future is a discipline based primarily in machine learning and other AI methods, with a more limited role for what we currently recognize as hydrological science. We do not want this to happen and suggest a `grand challenge' for the community to work toward demonstrating where and when hydrological theory provides information in a world dominated by big data.

Bio: Grey Nearing is an Assistant Professor at the University of Alabama and Research Director at Upstream Test, Public Benefit Corporation (https://upstream.tech/). His research focuses on the intersection of data-driven and hypothesis-driven environmental science for water related issues.

"From parameter calibration to parameter learning: Revolutionizing large-scale geoscientific modeling with big data"

Abstract: The behaviors and skills of models in many geoscientific domains strongly depend on spatially varying parameters that lack direct observations and must be determined by calibration. Calibration, which solves inverse problems, is a classical but inefficient and stochasticity-ridden approach to reconcile models and observations. Using a widely applied hydrologic model (VIC) and soil moisture observations as a case study, here we propose a novel forward-mapping parameter learning (fPL) framework. Whereas evolutionary algorithm (EA)-based calibration solves inversion problems one by one, fPL learns a more robust, universal mapping. fPL can save orders-of-magnitude computational time compared to EA-based calibration, while, surprisingly, producing equivalent or slightly better ending skill metrics. With more training data, fPL learned across sites and showed super-convergence, scaling much more favorably. Moreover, a more important benefit emerged: due to a global loss function, fPL produced spatially-coherent parameters in better agreement with physical processes. As a result, it demonstrated better results for out-of-training-set locations and uncalibrated variables. Compared to purely data-driven models, fPL can output unobserved variables, in this case simulated evapotranspiration, which agrees better with satellite-based estimates than the comparison EA. The deep-learning-powered fPL frameworks can be uniformly applied to myriad other geoscientific models. We contend that a paradigm shift from inverse parameter calibration to parameter learning will greatly propel various geoscientific domains.

Bio: Chaopeng Shen is Associate Professor in Civil Engineering at The Pennsylvania State University. He received the Ph.D. degree in environmental engineering from Michigan State University, East Lansing, MI, USA, in 2009. His PhD research focused on computational hydrology and he developed the hydrologic model Process-based Adaptive Watershed Simulator(PAWS), which was later coupled to the community land model to study the interactions between hydrology and ecosystem. He was a Post-Doctoral Research Associate with the Lawrence Berkeley National Laboratory, Berkeley, CA, USA, from 2011 to 2012, working on high-performance computational geophysics. His recent efforts focused on harnessing the big data and machine learning opportunities in advancing hydrologic predictions and understanding. He has written technical, editorial, review and collective opinion papers on hydrologic deep learning to call to attention the emerging opportunities for scientific advances. In addition, his research interests also include floodplain systems, scaling issues, process-based hydrologic modeling, and hydrologic data mining. He is currently an Associate Editor of the Water Resources Research and also an Associate Editor in Frontiers in AI.


"Groundwater table modelling at high spatial resolution using machine learning and process-based models"

Abstract: Machine learning provides great potential for modelling hydrological variables at a spatial resolution beyond the capabilities of physically based modelling. This study features an application of decision tree based regression to model the depth to the groundwater table in Denmark. In this study we focus on a wintertime minimum depth, at a 50 m resolution over a 15 000 km2 domain. In Denmark, the shallow groundwater poses severe risks with respect to groundwater-induced flood events, affecting both urban and agricultural areas. The risk is especially critical in wintertime, when the shallow groundwater is close to terrain. In order to advance modelling capabilities of the shallow groundwater system and to provide estimates at the scales required for decision-making, this study introduces a simple method to unify a Random Forests (RF) model and physically based modelling. Results from the national water resources model in Denmark (DK-model) at a 500 m resolution are employed as covariates in the RF model. Thus, RF ensures physical consistency at a coarse scale and fully exhausts high-resolution information from readily available environmental variables. Methods to quantify uncertainty as well as to map covariate importance are discussed.

Bio: Julian Koch is a Research Associate at the Department of Hydrology at the Geological Survey of Denmark and Greenland (GEUS). Julian holds a PhD in Geology from the University of Copenhagen. His research focuses on hydrological modeling, remote sensing analysis and machine learning for investigating the hydrological cycle. https://orcid.org/0000-0002-7732-3436

"Increasing the Value of Mechanistic Watershed Models Through Emulation and Machine Learning"

Abstract: This talk explores opportunities and challenges for implementing machine learning tools at a field experiment with extensive climatic, hydrologic, geologic and ecological observations and modeling. The site is an NSF-funded Critical Zone Observatory, part of a national and global network for Earth sciences. In the KGML effort I have the role of domain scientist in hydrology and a collaborator with the UMN KGML science team in developing strategies where physical catchment understanding may guide the development of new machine learning tools for advancing model predictions in hydrologic sciences.

The advantages of fully coupled, complex models of water, nutrients and sediments include: the adherence to science-based predictions, the flexibility to simulate in space and time at user selected resolutions, and that include the requirements for establishing quantitative relationships, feedbacks and co-dependencies among the flows and the states. The disadvantages of complex process-based models are: the computational cost and excessive time required for model execution, the large data needs to support the simulation, and the fact that these models are always under-constrained. The latter problem is particularly important since available data is based on classification (soils, geology, land cover, etc.) leading to biased estimates in the parameter fields and then in model predictions. One answer to the model complexity problem is to extract simplified, reduced dimension models which preserve specific characteristics of the simulated variables with reduced computational cost. The approach is a hybrid strategy that maintains the need for calibrated historical reanalysis of process-based models, but then implements emulators that utilize the complex model and the available observations as training data. The research goal is to develop a user framework that applies the unique capability of the model/data-driven emulators to automatically learn patterns from process-based models and available data, while minimizing computational cost and efficiently to provide ensemble simulations of comparable or better accuracy than the complex model.

KGML research recognizes and prioritizes the need for access and sharing of models and model results with expert and non-expert users alike (science collaborators, managers, consultants and the public).

Bio: Christopher J. Duffy is an Emeritus Professor in the Civil and Environmental Engineering Department of Penn State University. He has held appointments at Los Alamos National lab (1998-99), Cornell University (1987-88), Ecole Polytechnic Lausanne (2006-07), Smithsonian Institution, University Bristol, UK (20014-2016) University of Bonn, DE (2015). Duffy and his team focus on developing spatially-distributed, physics-based computational models for multi-scale, multi-process water resources applications (http://www.pihm.psu.edu/), supported by automated data services (www.hydroterre.psu.edu). Recent research as PI/Co-PI include: NSF Critical Zone Observatory, NSF INSPIRE, NSF EarthCube, EPA, CNH, DARPA World Modelers and NSF HDR .


"Physics Guided deep learning models for hydrology"

Abstract: Surface runoff prediction is one of the key challenges in the field of hydrology due to the complex interplay between multiple non-linear physical mechanisms behind runoff generation. While physical models are rooted in rich understanding of the physical processes, a significant performance gap still remains which can be potentially addressed by leveraging the recent advances in machine learning. The goal of this work is to incorporate our understanding of physical processes and constraints in hydrology into machine learning algorithms, and thus bridge the performance gap while reducing the need for large amounts of data compared to traditional data-driven approaches. In particular, we propose an LSTM based deep learning architecture that is coupled with SWAT (Soil and Water Assessment Tool) which is widely used by the hydrology community to model surface runoff. The key idea of the approach is to model auxiliary intermediate processes that connect weather drivers to surface runoff, instead of directly modeling runoff from weather variables. The efficacy of the approach is being analyzed on a small catchment located in the South Branch of the Root River Watershed in southeast Minnesota. Apart from observation data on runoff, the approach also makes use of a 200-year synthetic dataset generated by SWAT to improve the performance while reducing convergence time. In the early phases of this study, simpler versions of the SWAT model are being used in order to achieve a system understanding of the coupling of physics and machine learning. As more complexity is introduced into the present implementation, the research result of this case study will be generalized to more sophisticated cases where spatial heterogeneities are evolved.

Xiang Li Bio: Xiang Li is a current PhD student in the Water Resources Science program at University of Minnesota. His background is interdisciplinary. He earned his master degree of Water resources science and declared a minor in computer science (machine learning and data mining in particular) at UMN. His master thesis is about baseflow recession analysis and groundwater storage change analysis. His current work focuses on integrating machine learning algorithms into hydrology modeling, including SWAT models and classic baseflow recession analysis.

Ankush Khandelwal Bio: Ankush Khandelwal is a Research Associate in the Computer Science Department at University of Minnesota. He has a PhD in Computer Science from University of Minnesota. Khandelwal's research has been focused on developing novel machine learning algorithms to analyze vast amounts of satellite imagery for different earth science domains such as water, agriculture and forestry. His current research work is focused on developing physics aware machine learning algorithms for hydrological applications.

Shaoming Xu Bio: Shaoming Xu is a 1st year Ph.D. student in the Department of Computer Science and Engineering at the University of Minnesota Twin-Cities. His research interests are in Machine Learning, Data Mining, physics-guided machine learning, and computational Earth Science.


"Multiphysics-informed learning algorithm for vadose zone transport modeling – preliminary results"

Abstract: Combining multiple transport measurement types (pressure head, temperature, concentration) in a multiphysics framework is known to improve parameter estimates through crossover of mutual information. While crossover effects benefit the estimation process, there remain challenges associated with time to convergence, uncertainty in parameter estimates and resources required for field-scale multiphysics simulations. To overcome these challenges, we develop and test a multiphysics-informed learning algorithm that performs the joint solution of explicitly coupled variably-saturated water, heat, and solute transport equations with random forest (RF) and ensemble gradient boosting (EGB) regression machine learning kernels. These kernels provide simulated transport measurements based on stochastic input that regularizes the estimation process through additional pressure head (RF) and temperature and concentration (EGB) terms in the objective function. Our results demonstrate that the multiphysics-informed learning algorithm (1) reduces the number of iterations required for convergence by one order of magnitude, (2) reduces the error in estimated (water, heat, and solute) transport parameters by one order of magnitude, and (3) provides reduced-order machine learning models for coupled time-dependent simulations of pressure head, temperature, and solute concentration thereby avoiding the need for computational resources associated with traditional numerical forward solutions. We envision that reduced-order models will provide a rapid means to inform field management and policy decisions under stochastic boundary conditions.

Bio: Dr. Michael Friedel is a senior computational scientist at Pacific Northwest National Laboratory, USA; and associate researcher at Semeion Institute, IT. Prior to this role, Dr. Friedel was the environmental data analytics science leader at Lincoln Agritech, Lincoln University, NZ; senior research hydro-geophysicist at the Institute of Geological and Nuclear Science, NZ; and senior research geophysicist, senior research hydrologist, and supervisory hydrologist with the US Geological Survey.

Dr. Friedel has extensive experience in the development and application of methods and workflows that discover, quantify, and predict linkages and their response to climate, hydrologic and biogeochemical cycles across spatiotemporal scales. His research employs artificial-adaptive systems (evolutionary, machine-learning, learn-heuristics, metamodels, multimodal transfer learning, and physics-informed learning); process-based (traditional/joint numerical) and statistical (Bayesian, frequentist) methods. Dr. Friedel designs, collects, and integrates big data including direct (physical, chemical, biological) and indirect (geophysical and remote sensing) measurements across multiscale environmental networks (space, airborne, surface, borehole) to improve theory, scalability, and predictability.