Aquatic Sciences Session Details

Tuesday, August 10, 1:30-4:25

(All times for the workshop are listed in Central Time, UTC -5)

YOUTUBE LINKS: Please go to the KGML YouTube Channel for all available recorded presentations.

Session Organizers: Alison Appling and Paul Hanson

SPEAKERS:

2:50-3:00 BREAK

Knowledge-guided machine learning on the rise in the aquatic sciences

Abstract: Aquatic sciences have benefited greatly from machine learning (ML). Further benefits can be realized by infusing ML with aquatic science knowledge and using new tools to extract limnological insights from ML models. However, injecting and extracting such knowledge requires substantial technical and creative efforts -- barriers that the presentations in this session can help to break down by demonstrating innovative and effective methods for encoding and discovery of water knowledge. In this talk, we define basic concepts of knowledge-guided ML (KGML) and give examples of how its fundamental components, data-driven models and knowledge of aquatic patterns and processes, can be fused to improve both prediction and understanding. We identify some of the challenges confronted when applying KGML in the aquatic sciences, such as relatively small or heterogeneous datasets, the need to digest complex data and model outputs into intelligible summaries, and the ongoing pursuit of stakeholder trust and engagement for water resources management. We will highlight successes in overcoming these challenges and suggest future directions to further the mixing of machine learning and aquatic science knowledge.

Alison Appling Bio: Alison Appling is a water data scientist with the US Geological Survey. She has a bachelor’s degree in Symbolic Systems from Stanford University and a PhD in Ecology from Duke University. Her research addresses the movement of energy, carbon, and nutrients through rivers, lakes, and floodplains, with an emphasis on using data science and machine learning to improve the estimation and prediction of water quality variables.

Paul Hanson Bio: I am a Distinguished Research Professor at the Center for Limnology at the University of Wisconsin. I study how climate change, land use, and human activity impact our inland waters, as well as the ways in which big data and team science are shaping freshwater research on a global scale. I also study the art of science, especially through my musical pursuits.



"Using Multi-scale Machine Learning Models to Develop a Predictive Understanding of the Impacts of Disturbances on River Water Quality"

Abstract: Hydrometeorological disturbances such as floods, droughts, and heatwaves are projected to increase over the next few decades due to climate change. These disturbances can worsen water quality by impacting water temperature and salt, nutrient, contaminant concentrations, which will have direct consequences for human and ecosystem health. In this talk, I will describe our research activities in building multi-scale machine learning models for predicting stream temperatures, and their changes caused by new disturbance regimes. A particular focus is on data-driven analyses such as statistical/trend analysis, unsupervised clustering to infer physical information that can inform model selection, architecture, hyperparameters and input features for different spatial scales. These concepts are demonstrated in the development of low-complexity machine learning models (Multiple Linear Regression, Support Vector Regression, and Random Forest Regression/XGBoost) to predict monthly stream water temperature at the point to watershed to regional scales for hydrological basins with differing climate, geological, land use and water management attributes.

Bio: Charu is a biogeochemist and data scientist in the Earth and Environmental Sciences Area at Berkeley Lab, and is a senior fellow of the Berkeley Institute of Data Sciences. Her research focuses on the nexus of water, energy and carbon for the purposes of sustainable water resource management. She was awarded a DOE Early Career Research Award in 2019 to build a data-driven framework to predict the impacts of floods and droughts on water quality in the United States. Her group is developing machine learning models for water quality predictions, and working on software tools for data management, integration, analysis and visualization related to large DOE projects that study terrestrial and subsurface ecosystem processes. She earned her Ph.D. from the Massachusetts Institute of Technology and conducted her postdoctoral research at Berkeley Lab.

"Remember where you are: teaching a stream temperature model to embrace long-term groundwater exchange patterns"

Abstract: Stream temperature predictions are an essential tool for both resource managers and aquatic researchers; however, achieving ecologically-relevant predictive accuracy is challenged by spatially and temporally variable physical controls on the stream channel heat budget. Knowledge-guided machine learning (KGML) models of stream temperature leverage the respective strengths of both process-driven and data-driven temperature models and have shown superior predictive accuracy. Groundwater discharge is a well-established driver of stream temperature, but existing KGML models of stream temperature do not explicitly incorporate groundwater discharge guidance and have been ineffective in recognizing these patterns from the available stream temperature observations. Lack of explicit groundwater discharge representation in KGML stream-temperature models results in lower predictive accuracy in groundwater- influenced reaches and resource managers are less likely to use stream temperature predictions that do not incorporate groundwater discharges. At annual timescales, groundwater discharge reduces the amplitude of stream-water-temperature fluctuations and sometimes introduces a lag between changes in air temperature and changes in water temperature. To explicitly encourage the model to reproduce these characteristic temperature signals, we have developed new loss terms in a custom model training loss function. The loss function is informed by both observed temperatures and outputs from a numerical groundwater-flow model (MODFLOW). Preliminary results suggest the groundwater loss terms substantially improve prediction accuracy for strongly groundwater-influenced stream reaches, at a mild cost to accuracy for groundwater-disconnected reaches. These improvements could be especially important for long-term temperature projections because groundwater exchange moderates stream temperature responses to climate change.

Bio: Janet is a hydrologist with an interest in incorporating unique types of data into hydrologic models. She works with the United States Geological Survey in the New England Water Science Center where her current projects include simulating nitrogen transport through groundwater and incorporating groundwater discharge into deep learning models of stream temperature. One of her favorite projects involved surveying streams with thermal infrared cameras to identify groundwater discharge areas for use in evaluating groundwater models. When she’s not working, she’d like to be paddling, in the garden, or enjoying a cup of strong coffee.

"Process guided deep learning for decision-ready predictions"

Abstract: Many endangered and economically important aquatic species in the Delaware River Basin (DRB) are negatively affected by warm water. Millions of people rely on water from the DRB for drinking water, and diversions for human use combined with a warming climate contribute to thermal stress in the basin. However, cold water stored in reservoirs can be released to maintain thermal habitat when stream water temperature is anticipated to exceed an organism’s thermal tolerance. Therefore, managers rely on models to accurately predict water temperature to inform when and how much water to release. In the DRB, accurately predicting daily water temperature across the stream network is difficult because water temperature is a result of a complex suite of natural processes and human decisions, including reservoir storage and release decisions that often decouple upstream and downstream water temperature dynamics. Here, we present a new hybrid modeling technique that combines a process model (a coupled hydrologic and thermodynamic model) with a deep learning model to improve water temperature predictions in the DRB. The resulting process-guided deep learning (PGDL) model outperformed the uncalibrated process model and the pure deep learning models according to traditional performance metrics, such as root mean squared error, and metrics that describe decision-relevant outcomes, such as correctly predicting a temperature threshold exceedance. Further, we describe how these PGDL methods were extended into a data assimilation framework to produce real time 7-day ahead forecasts to support decision making.

Bio: Dr. Samantha Oliver is a data scientist at the US Geological Survey and does research on broad scale water quality issues. Currently, she manages a project that is focused on using hybrid process and machine learning models to improve water temperature prediction in streams below reservoirs in the Delaware River Basin. Samantha received her PhD in limnology from the University of Wisconsin-Madison, and she lives in Madison, WI with her husband and two kids.

Lake Expedition 2020

"Lake Expedition 2020: a virtual collaboration of early career researchers integrating machine learning and aquatic sciences"

Abstract: Challenging big science problems can be addressed through collaborations that effectively tap the variety of expertise and backgrounds in the group. However, to form cohesive teams and efficient workflow, team management practices are necessary. The Lake Expedition 2020 is a GLEON (Global Lake Ecological Observatory Network) Fellowship Program composed of 15 researchers with the goal of answering the large-scale limnological question: What are the patterns and drivers of lake area change for 103K lakes and reservoirs of the US? The analytical paradigm used to address this question is Knowledge Guided Machine Learning, which applies both the power of machine learning and the rich domain knowledge of limnology. The cohort consists of predominantly early-career researchers from various backgrounds in the aquatic sciences, hydrology, and computer sciences. Individual experience with Machine Learning ranges from novice to advanced, which introduces a mix of technical and domain knowledge carried by the group. Additionally, the incorporation of both early career researchers and experienced scientists has allowed for a wide array of perspectives. However, collaborative research relies on team diversity, engagement, genuine relationships, and the management of these relations. Working in a virtual setting across time zones introduces logistical and communication challenges that can complicate effective collaboration and management techniques. This talk aims to provide suggestions for effective team science and to reflect on the challenges and successes of our collaborative efforts to integrate KGML with large-scale limnological questions in a setting of interdisciplinarity and variable expertise.

Bio: Jenna Robinson is a PhD Candidate of The Global Water Lab in the Department of Biology at Rensselaer Polytechnic Institute.

Bio: Maartje Korver is a PhD Candidate of the Global HydroLab in the Department of Geography at McGill University.

"Learning from mistakes - Assessing the performance and uncertainty in process-based models"

Abstract: Typical applications of process- and physically-based models aim to gain a better process understanding of certain natural phenomena or to estimate the impact of changes in the examined system caused by anthropogenic influences, such as land-use or climate change. To adequately represent the physical system, it is necessary to include all (essential) processes in the applied model and to observe or estimate relevant inputs in the field. However, errors, i.e., deviations between observed and simulated values, can still occur. Other than large systematic observation errors, simplified, misrepresented or missing processes are potential sources of errors. This study presents a set of methods and a workflow for analyzing errors of process-based models and link them to process representations.

The evaluated approach consists of three steps: (1) training a machine learning error-model using the input data of the process-based model and other available variables, (2) estimation of local explanations (i.e., contributions of each variable to a individual prediction) for each predicted model error using SHapley Additive exPlanations (SHAP) in combination with Principal Components, (3) clustering of SHAP values of all predicted errors to derive groups with similar error generation characteristics. By analyzing these groups of different error/variable association, hypotheses on error generation and corresponding processes can be formulated. That can ultimately lead to improvements in process understanding and prediction.

The framework is applied to the process-based stream water temperature model HFLUX in a] case study for modelling an alpine stream in the Canadian Rocky Mountains. Initial statistical tests show a significant association of model errors with available meteorological and hydrological variables. By using these variables as input features, the applied ML model is able to predict model residuals. Clustering of SHAP values results in four distinct error groups that can be related to tree shading, sensible and latent heat flux and longwave radiation emitted by trees.

Model errors are rarely random and often contain valuable information. Assessing model error associations is ultimately a way of enhancing trust in implemented processes and of providing information on potential areas of improvement to the model.

Bio: Moritz Feigl is a PhD student at the Institute for Hydrology and Water Management (HyWa) at the University of Natural Resources and Life Sciences (BOKU) in Vienna. He received a BSc and MSc in Environmental Engineering from BOKU and a BSc in Statistics from the University of Vienna. His research is focused on machine learning applications in hydrology and aquatic sciences, with particular focus on developing tools to couple process-based and data-driven modelling approaches.

"Biology-guided Neural Networks: Integrating Biological Knowledge with Neural Networks for Discovering Phenotypic Traits from Fish Images"

Abstract: This talk will introduce novel advances in knowledge-guided machine learning for applications in biology, where there is a growing volume of biodiversity data (often available as images) and scientific knowledge is often available as biological ontologies and phylogenies. This talk will describe several ways of integrating biological knowledge in state-of-the-art deep learning models for species classification and trait segmentation of fish images, with the goal of improving generalization performance even with limited amounts of labeled training data. The talk would additionally demonstrate the importance of integrating scientific knowledge in machine learning frameworks for improved interpretability and robustness to adversarial attacks. The talk will conclude with a discussion of future prospects in the emerging field of knowledge-guided machine learning that has the potential to impact several application areas in biology including aquatic sciences, that have a rich wealth of scientific knowledge and some availability of data.

Bio: Anuj Karpatne is an Assistant Professor in the Department of Computer Science at Virginia Tech, where he develops data mining and machine learning methods to solve scientific and socially relevant problems. A key focus of Dr. Karpatne’s research is to advance the field of science-guided machine learning for applications in several domains ranging from climate science, hydrology, and ecology to cell cycle biology, mechano-biology, quantum science, and fluid dynamics. Dr. Karpatne co-organized the FEED 2018 workshop, served as the workshop co-chair for SIGKDD 2019, and has co-organized sessions at AAAS Annual Meeting 2019 and AGU Fall Meetings 2017 and 2018. He is currently serving as the co-Editor-in-Chief for the SIGAI “AI Matters” and the Review Editorial Board Member for “Data-driven Climate Sciences” section in Frontiers in Big Data journal. In recognition of his interdisciplinary research efforts in geosciences, Dr. Karpatne was named the Inaugural Research Fellow by the IS-GEO (Intelligent Systems for Geosciences) Research Coordination Network in 2018. Dr. Karpatne is also a co-author of the second edition of the textbook, Introduction to Data Mining. He received his Ph.D. in Computer Science at the University of Minnesota in 2017 under the guidance of Prof. Vipin Kumar.