Aquatic Sciences Session Details

Tuesday, August 18, 1:30-4:30

YOUTUBE LINKS: Part 1(start to break), Part 2 (break to session end)

Session chairs: Paul Hanson and Hilary Dugan

SPEAKERS:

  • 1:30-1:40 Paul Hanson University of Wisconsin, Madison: Introduction and opening of "Ecological knowledge guides machine learning: (i) process-guided phosphorus modeling, (ii) state-space modeling of lake oxygen dynamics” (slides) (link to recorded presentation)

  • 1:40-2:00 Robert Ladwig, University of Wisconsin, Madison: Continuation and closing of "Ecological knowledge guides machine learning: (i) process-guided phosphorus modeling, (ii) state-space modeling of lake oxygen dynamics” (slides) (link to recorded presentation)

  • 2:00-2:20 Alison Appling, United States Geological Survey: "Applications of knowledge-guided machine learning for water resources management" Note: This presentation is not publicly available. If you would like access to this recorded presentation, please contact kgmlworkshop@umn.edu.

  • 2:20-2:40 Jared Willard, University of Minnesota: "Predicting Water Temperature Dynamics of Unmonitored Lake Systems with Meta Transfer Learning (MTL)" (slides) (link to recorded presentation)

3:00-3:10 BREAK

"Ecological knowledge guides machine learning: (i) process-guided phosphorus modeling, (ii) state-space modeling of lake oxygen dynamics”

Abstract: We introduce knowledge-guided machine learning (KGML) within the context of aquatic sciences and demonstrate how melding ecological knowledge with machine learning can improve both predictions and understanding of ecosystem dynamics in lakes and reservoirs. We provide a brief background for why the study of fresh waters can benefit from this synergistic approach and follow that with two applications, a published study on lake phosphorus dynamics and a study in progress about lake dissolved oxygen depletion.

(i) A simple process-based model for lake phosphorus predicted complex lake dynamics well when used in concert with machine learning. Although the process-based model alone recreated basic annual patterns in lake phosphorus dynamics, it missed important short-term and long-term patterns of the observed data. The KGML identified these patterns and attributed them to additional influences of temperature at annual time scales and likely a long-term decrease in phosphorus load to the lake.

(ii) We set up a two-layer Bayesian model to simulate dissolved oxygen dynamics in 8 lakes in Wisconsin, USA. The output of each state-space model was used to guide predictions of a recurrent neural network that also used a broad suite of driving variables assumed to be relevant to lake oxygen dynamics. The coupled model is well suited to identify additional catchment-specific drivers and properties not included in the state-space model that are important for long-term oxygen depletion dynamics. These additional drivers helped differentiate control of lake oxygen dynamics in catchments with contrasting land use.

Dr Hanson Bio: I am a Distinguished Research Professor at the Center for Limnology at the University of Wisconsin. I study how climate change, land use, and human activity impact our inland waters, as well as the ways in which big data and team science are shaping freshwater research on a global scale. I also study the art of science, especially through my musical pursuits.

Dr Ladwig Bio: I am a Postdoc at the Center for Limnology at the University of Wisconsin-Madison advised by Hilary Dugan and Paul Hanson working on simulating aquatic ecosystems using process-based models (e.g., GLM-AED2). A main focus of my research is to develop scientific open-source software that eases the application of numerical models for first-time users (an example is the AEMON-J LakeEnsemblR project). I am interested in investigating feedback mechanisms between atmospheric drivers, the catchment and in-lake stratification/mixing dynamics. Recently, a focus of my work is on understanding what is driving dissolved oxygen depletion dynamics in lakes.


"Applications of knowledge-guided machine learning for water resources management"

Abstract: Accurate model predictions of water quality and quantity, such as those offered by knowledge-guided machine learning (KGML), are essential for assessing and improving water resources to support human and ecological health, recreation, agriculture, and industry. Here I will discuss two applications of KGML to questions facing water resources managers. The first use case is predicting fish thermal habitat to guide fisheries management decisions in hundreds of lakes in the Upper Midwest; the second is predicting near-term river flows and temperatures to inform timed releases of reservoir water in the Delaware River Basin. In both use cases, the consistent accuracy of KGML over available process model predictions has appealed to stakeholders. KGML models generate accurate predictions even when predicting outside the range of observations used to train the model, making them particularly useful for predicting water temperatures and flows in a changing climate. KGML models are most accurate when data are abundant but still outperform process models even when data are sparse, and thus are applicable in a wide range of lakes, reservoirs, and river networks. Although the trustworthiness of data-driven models continues to be a point of discussion and investigation, these case studies show great promise for the adoption of KGML approaches in water resources applications.

Bio: Alison Appling is a water data scientist with the US Geological Survey. She has a bachelor’s degree in Symbolic Systems from Stanford University and a PhD in Ecology from Duke University. Her research addresses the movement of energy, carbon, and nutrients through rivers, lakes, and floodplains, with an emphasis on using data science and machine learning to improve the estimation and prediction of water quality variables.

"Predicting Water Temperature Dynamics of Unmonitored Lake Systems with Meta Transfer Learning (MTL)"

Abstract: Though sensor-based monitoring and machine learning applications have both seen rapid recent growth in the environmental sciences, the majority of freshwater lakes remain unmonitored and thus have been inaccessible for machine learning models to predict water temperature. This talk describes a Meta Transfer Learning (MTL) framework that builds a meta learning model on contextual lake attributes, statistical meta-features, and past performance measures of candidate models to select ideal source models, trained in monitored lakes, to predict water temperature dynamics in unmonitored target lakes. Source models included calibrated process-based models and machine-learning-based models employing a recently developed approach, process-guided deep learning (PGDL). This work demonstrates that integrating scientific knowledge into both the candidate source models for well-observed systems (the PGDL approach) as well as the meta learning model deciding which models to transfer (MTL) shows promise for predicting many different kinds of unmonitored systems with important environmental variables.

Bio: Jared Willard is a 4th year Ph.D. student at the University of Minnesota advised by Vipin Kumar. He received his BA in physics, applied mathematics, and computer science from Macalester College in 2015. His research interests are physics-guided machine learning, deep learning, and environmental applications.

"Matched-up, the importance of open-access training data for global-scale remote sensing of water quality"

Abstract: In this talk I'll highlight the importance of open access machine learning training datasets, highlight some use cases in remote sensing of water quality, and more broadly discuss institutional and professional barriers to making more curated and analysis-ready datasets.

Bio: I’m an ecosystem scientist interested in how people control and change the environment and how altered landscapes impact the streams and rivers that drain them. To explore how people reshape landscapes and create novel controls of ecosystem processes, I use a range of approaches from remote sensing to intensive field sampling.

"Harnessing the literature, datasets, and models to understand continental-scale lake phosphorus recovery times"

Abstract: Lakes may take years to decades to "recover" from historical and current rates of phosphorus (P) input due to the dynamics of internal P cycling (settling, resuspension, burial). Even if external P loads are reduced, lake water total P concentrations may not decrease for a long time. Lake managers need to know how long it could take to detect improvements in water total P concentrations to inform management goals, long term management plans, and stakeholder expectations. A team of EPA researches started investigating two major questions: 1) What are the important physical, chemical, and biological factors driving internal lake P cycling (and how do the relationships differ by lake ecosystem parameters? 2) If external (point source and non-point source) P load into a lake is reduced by X% how many months or years would it take for surface water total P concentrations to decrease by Y%? We will use information from the literature and lake datasets, as well as a mechanistic limnological model, to address the major questions for lakes across the conterminous United States. We seek input from workshop participants about how we can incorporate machine learning or other approaches to efficiently answer our research questions.

Bio: Sylvia Lee is a biologist at the U.S. Environmental Protection Agency’s Office of Research and Development based in Washington, D.C. She works to provide scientific support to federal and state decision makers on biological assessments and nutrient criteria for aquatic ecosystems. Sylvia also works on improving the use of diatoms as biological indicators and serves as Chair of the Diatom Taxonomic Certification Committee. She co-instructs the Ecology and Systematics of Diatoms course at Iowa Lakeside Laboratory through her adjunct professor position at the University of Iowa.

"Data-driven approaches to building efficient machine learning models for aquatic science and hydrology"

Abstract: In this talk, I will describe our research activities to build accurate, efficient ML models using two different approaches. The first is to conduct data-driven analyses to infer physical information that can inform model selection, architecture, hyperparameters and input features. The data-driven analyses can include a combination of statistical/trend analyses, pattern recognition, and Bayesian statistics/causal inference approaches. A second approach is to use a black-box hyperparameter optimization that uses surrogate models to determine the best deep learning model architecture and hyperparameters. These models have been used in predictions of daily groundwater levels. Through parameter sensitivity analysis we show that the use of all available features and data for training a model does not necessarily ensure the best predictive performance. These approaches will improve ML model performance, reduce computational run-times and avoid overfitting, as compared to traditional machine learning approaches.

Bio: Charuleka Varadharajan is a research scientist in Earth and Environmental Sciences at the Lawrence Berkeley National Laboratory. As a biogeochemist and environmental data scientist, Charu is interested in the water, energy and carbon nexus to understand and limit the impacts of human activities on water resources and climate. Her research has previously involved studying the fate, transport and mitigation of contaminants in groundwater; measurement and prediction of carbon fluxes in terrestrial and subsurface environments; and management, synthesis, and analysis of diverse multi-scale environmental datasets.

"Integrating Physics into Machine Learning for Monitoring Scientific Systems"

Abstract: Given rapid data growth due to advances in sensor technologies, there is a tremendous opportunity to systematically advance modeling in these domains by using machine learning methods. However, the “black box” use of ML often leads to serious false discoveries in scientific applications. Because the hypothesis space of scientific applications is often complex and exponentially large, a pure data-driven search using limited observation data can easily select a highly complex model that is neither generalizable nor physically interpretable, resulting in the discovery of spurious patterns. In this talk, we present a physics-guided machine learning approach that combines advanced machine learning models and physics-based models for predicting water temperature and streamflow in river networks. We first build a recurrent graph network model to capture the interactions among multiple segments in the river network. Then we propose a pre-training technique which transfers knowledge from physics-based models to initialize the machine learning model and learn the physics of streamflow and thermodynamics. Additionally, we propose a new loss function that balances the performance over different river segments. We demonstrate the effectiveness of the proposed method in predicting temperature and streamflow in a subset of the Delaware River Basin. The proposed method has also been shown to produce better performance when generalized to different seasons or river segments with different streamflow ranges.

Bio: Xiaowei Jia is an Assistant Professor in the Department of Computer Science at the University of Pittsburgh. He received his Ph.D. degree from the University of Minnesota in 2020 under the supervision of Prof. Vipin Kumar. Prior to that, he got his M.S. degree from State University of New York at Buffalo in 2015 and his B.S. degree from University of Science and Technology of China in 2012. His research interests include physics-guided data science, spatio-temporal data mining, deep learning, and remote sensing. His work has been published in major data mining and AI journals (e.g., TKDE) and conferences (e.g., SIGKDD, ICDM, SDM, and CIKM), as well top-tier journals in hydrology (e.g., WRR) and agronomy (e.g., Agricultural Economics). Xiaowei was the recipient of UMN Doctoral Dissertation Fellowship (2019), the Best Conference Paper Award in ASONAM 16, and the Best Student Paper Award in BIBE 14.