W2D2S2:
Western Washington
Data-driven Discovery Seminar Series

Thursdays 3:00-4:00

To attend sign up below or contact the organizers at w2d2s2organizers@gmail.com

Data are being collected at an unprecedented rate and in an ever-increasing number of modalities. The result is a host of new opportunities in science, engineering, and society at large. Many important scientific questions are best addressed through the collaboration of domain experts and data scientists bringing their respective expertise to learn from the data.

The Western Washington Data Driven Discovery Seminar Series, hosted by Western Washington University, in cooperation with Pacific Northwest National Laboratory, will bring together two complementary groups: (1) Experts in computer science, statistics, and mathematics to present cutting edge work that is being done in data science, and (2) Domain experts from a wide range of disciplines with interesting datasets and data related problems. The seminar series will consist of online talks, and discussion panels from experts working in a wide range of scientific disciplines and data science application domains. The seminar series aims to connect students, professors, researchers, and professionals for prospective cross-disciplinary and cross-organizational research collaborations.

Seminar sessions are hosted on Thursdays from 3:00-4:00 p.m.

If you would like to receive seminar updates and invites please sign up below (or email us at w2d2s2organizers@gmail.com)

Spring 2022 Theme: Earth and Environmental Sciences


The Spring 2022 theme of the Western Washington Data-Driven Discovery Seminar Series (W2D2S2) is Earth and Environmental Sciences. Now in its second year, W2D2S2 is hosted by Western Washington University and Pacific Northwest National Laboratory (PNNL). The schedule of weekly online talks aims to bring together Earth and environmental scientists, computer scientists, statisticians, and mathematicians, to share, discuss, and explore recent findings, and to broaden avenues for future collaborations. The seminar series casts a broad net and aims to offer something for students, professors, researchers, and professionals in cross-disciplinary

and cross-organizational research collaborations

For information on past discussions visit the Seminars pages. Note that talks are not recorded - please contact the organizers for the zoom link if you wish to attend.

Talks:

Thursday April 21, 2022 - Earth System Modeling and Scientific Machine Learning

A Risk Analysis Framework for Tropical cyclones (RAFT). Dr. Karthik Balaguru and Wenwei Xu, MS | Pacific Northwest National Laboratory

A Risk Analysis Framework for Tropical cyclones (RAFT). Dr. Karthik Balaguru and Wenwei Xu, MS | Pacific Northwest National Laboratory

Bio: Karthik Balaguru is an Earth Scientist at Pacific Northwest National Laboratory. He received a Ph.D. in Physical Oceanography from Texas A&M University in 2011. He has broad interests in the areas of upper-ocean dynamics, air-sea interactions, and climate. Topics of particular interest are water cycle changes and their relationship with the climate extremes and the application of machine learning techniques to enhance earth system predictability.

Bio: Wenwei Xu is a Data Scientist at Pacific Northwest National Laboratory. He received a Masters’ degree in Civil & Environmental Engineering from Portland State University in 2014. Since then, he has developed an interest in applying Machine Learning techniques to solve earth systems and environmental-related problems. He has a broad research interest in Tropical Cyclone, GIS, and Computer Vision.

Abstract: "Tropical cyclones (TCs) or hurricanes are among the most destructive natural hazards in the global tropics and subtropics, with the capacity to impact millions of people annually. Even for the U.S. they pose a significant threat to the population and critical infrastructure in the coastal regions, making it important to characterize the risk associated with them and understand how they may evolve in a changing climate. While the reliable observed TC record is not long enough to robustly quantify storm behavior, direct simulation of TCs using high-resolution numerical models is computationally expensive. To overcome these challenges, a Risk Analysis Framework for Tropical Cyclones (RAFT) is being developed at PNNL to generate synthetic TCs on computers. RAFT is a hybrid modeling approach that combines physics-based models with machine learning to model not only the physical behavior of TCs but also the human-systems impacts associated with them.

The two specialized machine learning components of RAFT include a Feedforward Neural Network-based TC intensity model that predicts storm maximum surface wind and a Convolutional Neural Network-based TC rainfall model that estimates the quantity and spatial distribution of rainfall.  The success of RAFT is a great example of breaking down a complex physical phenomenon into smaller components and combining both physics-based tools and machine learning-based tools to simulate extreme events more accurately and more efficiently."


Thursday April 28, 2022 - Climate and Extreme Weather

Some novel approaches to reduced-order climate modeling. Dr. Ben Kravitz | Indiana University

Some novel approaches to reduced-order climate modeling. Dr. Ben Kravitz | Indiana University

Bio: Dr. Ben Kravitz is an assistant professor in the Department of Earth and Atmospheric Sciences at Indiana University. He holds a B.A. in mathematics from Northwestern University, an M.S. in mathematics from Purdue University, and an M.S. and Ph.D. in atmospheric science from Rutgers University. He completed a postdoctoral research position at the Carnegie Institution for Science and another postdoctoral research position at Pacific Northwest National Laboratory, where he became a staff scientist in 2015. He joined the faculty at Indiana University in 2019, maintaining a joint appointment at Pacific Northwest National Laboratory. Dr. Kravitz is an international expert in climate model simulations of climate engineering. His current activities also include using engineering and mathematical techniques in climate models to better understand climate feedbacks, studying teleconnections in high latitude climate, and developing climate model emulators for use in Integrated Assessment Models.

Abstract: Climate models are our best mathematical representations of the real world, but they are very costly to run. Reduced-order modeling of the climate has long been used to make climate modeling more computationally tractable by reducing fidelity or complexity. Novel applications of computer science methods, including machine learning and climate networks, show pathways toward reduced order climate modeling that retains complexity but at reduced computational expense. I discuss three recent studies involving (1) using machine learning to improve short-term climate forecasts, (2) using climate networks to quantify Earth system teleconnections, and (3) another application of machine learning to generate numerous realizations of weather for use in quantifying extreme events.


Quantifying the Influence of Natural Climate Variability on DJF Extreme Daily Precipitation . Dr. Mark Risser | LBNL

Bio: Mark Risser is a Research Scientist in the Climate Division at the Lawrence Berkeley National Laboratory. He received my Ph.D. in Statistics from the Ohio State University in 2015 (advised by Catherine Calder), and his primary goal as a statistician is to use data science, Bayesian modeling, and computational tools to identify and quantify climate change. Mark's primary research is in climate, spatial/environmental statistics, and Bayesian modeling, but he also has interests in extreme value analysis, multiple testing, computational methods, and data visualization.

Abstract: While various studies explore the relationship between individual sources of climate variability and extreme precipitation, there is a need for improved understanding of how these physical phenomena simultaneously influence precipitation in the observational record across the contiguous United States. In this work, we introduce a single framework for characterizing the historical signal (anthropogenic forcing) and noise (natural variability) in seasonal mean and extreme precipitation. An important aspect of our analysis is that we simultaneously isolate the individual effects of seven modes of variability while explicitly controlling for joint inter-mode relationships. Our method utilizes a spatial statistical component that uses in situ measurements to resolve relationships to their native scales; furthermore, we use a data-driven procedure to robustly determine statistical significance. In the Boreal winter, the El Niño/Southern Oscillation, the Pacific–North American pattern, and the North Atlantic Oscillation exhibit the largest influence on seasonal extreme precipitation. We are able to detect at least some significant relationships in all seasons in spite of extremely large (> 95%) background variability in both mean and extreme precipitation. Furthermore, we specifically quantify how the spatial aspect of our analysis reduces uncertainty and increases detection of statistical significance while also discovering results that quantify the complex interconnected relationships between climate drivers and seasonal precipitation.



Thursday May 5, 2022 - Natural Resource Modeling

Two buoys and a large ocean: Using state-of-art instruments to study offshore wind resource assessment. Dr. Raghu Krishnamurthy | PNNL

Mesoamerican Tree Species Composition Suggests Alternative Stable States . Dr. Hank Stevens | Miami University

Two buoys and a large ocean: Using state-of-art instruments to study offshore wind resource assessment. Dr. Raghu Krishnamurthy | PNNL

Bio: Dr. Krishnamurthy joined PNNL in 2019, prior to that he was working as an assistant professor at University of Notre Dame. He currently is a mentor for the Doppler lidar network at ARM and is a PI for two wind energy projects, Lidar buoy science and Wind Forecasting Improvement Project 3, funded by DOE Wind Energies Technologies Office. His research interests are focused on using observations and modeling to improve our understanding of the atmospheric boundary layer.

Abstract: In recent years, there has been rapidly increasing interest in the siting, buildout and efficient operation of offshore wind plants, as required to meet domestic renewable energy targets, i.e., 35% of the nation's electricity demands by 2050, of which 110 GW are expected to be deployed offshore. With a recent push to deploy 30 GW by 2030, quantifying the uncertainty in wind resource characterization is a top priority of the US DOE offshore wind strategy. Herein we will provide some details about unique offshore measurements that PNNL is collecting using two lidar-equipped buoys and how we have been using these measurements to improve our understanding of the offshore wind resource.


Mesoamerican Tree Species Composition Suggests Alternative Stable States . Dr. Hank Stevens | Miami University

Abstract: Detecting and confirming alternative stable states in naturally occurring dynamical systems is challenging, and in this talk, I describe several lines of circumstantial evidence that remnant garden forests of the ancient Maya are one such state in Mesoamerican forests. These ancient garden forests are found in areas of once high density Maya settlements, and they are characterized by relatively high abundances of once-culturally important woody species. We hypothesize that these forests persist as an alternative stable state that is maintained by positive feedbacks between the unusually high density of animal-dispersed tree species and their frugivore dispersers. We evaluated a series of predictions using spatially-explicit tree census data and phenological information gleaned from the literature. Our results show that Maya garden tree species are more likely than non-garden species to be animal-dispersed, and that many of these species showed high levels of spatial clustering indicative of short dispersal distances, but only within garden forests. We also found that garden forests were more likely to replace themselves, suggesting more stable community dynamics than in the surrounding forest matrix. Last, we found that phenological patterns of aggregate community-level fruiting potential by garden species were more likely to support frugivore populations than fruiting by non-garden species. Collectively, these findings suggest that the garden forests left behind by the ancient Maya are dynamical attractors that have helped maintain these forests for over 1000 years since the collapse of the Mayan civilization. If so, this would be a unique type of a long-term human landscape legacy, and would add to our growing appreciation that our most wild primeval landscapes have often harbored and been managed by our ancestors.



Thursday May 12, 2022 - Geointelligence and Remote Sensing

GeoAI for Earth Observation Analytics: Applications and Challenges. Andre Coleman, MS and Troy Saltiel, MS | PNNL

GeoAI for Earth Observation Analytics: Applications and Challenges. Andre Coleman, MS and Troy Saltiel, MS | PNNL

Bio: André Coleman has served as a senior research/data scientist at the Pacific Northwest National Laboratory (PNNL) in Richland, Washington, USA since 2000. He brings 27-years of professional experience in the fields of geoinformatics, hazard informatics, geointelligence, hydrology, bioenergy, and computer science. His research interests are focused in spatial and numerical modeling, remote-sensing (satellite, airborne, UAS), rapid automated disaster assessments, machine/deep learning and evolutionary computing, heterogeneous data fusion, water security, and coupling of spatial and physics-based numerical models. To date, Andre has authored or co-authored 108 publications, including 45 peer-reviewed journal articles and 3 book chapters.

Bio: Troy Saltiel is a post master’s research associate who joined PNNL in 2020. His work focuses on geographic information science and remote sensing, often involving geospatial big data like highly detailed multispectral and hyperspectral imagery, LiDAR data, and national-scale datasets. Recent examples include analysis of historical fire weather data, spaceborne mapping of aerially deployed fire retardant using machine learning, land screening for suitable algae biofuel sites, vegetation characterization using UAV-collected RGB imagery and LiDAR data, and species-level vegetation mapping with hyperspectral data. Troy graduated from the University of Utah with an MS in geography in 2021.

Abstract: From targeted remote sensing applications via uncrewed aerial systems to global collections via rapidly growing microsatellite constellations, the volume, velocity and variety of big earth observation data are growing at an unprecedented rate. GeoAI is an interdisciplinary field that bridges geographic sciences, remote-sensing, and data science to understand phenomena and derive actionable data in the natural and human environment; this field presents a unique challenge space, where spatial, spectral, and temporal have inherent relationships, and are not mutually exclusive. Current and emerging state-of-the-art methods show encouraging applicability to a diverse range of remote sensing applications, furthering our ability to effectively utilize Big Earth Data. The challenges and opportunities to realizing these benefits, however, differ from other more common computer vision problems. Generalizable models provide an apex for applied use, however for GeoAI, the field is still focused on tailored solutions that explore the art of the possible. This talk will present several examples of GeoAI applications at PNNL and address current challenges and opportunities related to generalization and transferability of GeoAI models.

Thursday May 19, 2022 - Earth System Modeling

Accelerating Geologic Carbon Storage Leakage Risk Quantification Using Deep Learning. Dr. Diana Bacon | PNNL

Machine Learning and Uncertainty Quantification for Earth System Modeling and Prediction. Dr. Jason Hou | PNNL

Accelerating Geologic Carbon Storage Leakage Risk Quantification Using Deep Learning. Dr. Diana Bacon | PNNL

Bio: Dr. Diana Bacon is a computational scientist with expertise in hydrology and geochemistry. Her research has focused on developing and applying multiphase flow and reactive transport simulators to understand the fate and transport of radionuclides, carbon and pollutants in groundwater. She is currently applying deep learning to develop fast forward simulators for pressure management during carbon storage operations as part of the U.S. Department of Energy’s SMART initiative and to the development of surrogate models of groundwater impacts related to CO₂ and brine leakage from CO₂ sequestration reservoirs for the U.S. Department of Energy’s National Risk Assessment Partnership (NRAP).

Abstract: Geologic carbon storage is a method of securing carbon dioxide (CO₂) in deep geologic formations to prevent its release to the atmosphere and contribution to global warming as a greenhouse gas. To obtain a permit to inject CO₂ into a deep saline reservoir in United States, a site operator is required to develop a simulation of CO₂ injection at the site and determine the “area of review” where there is potential for leakage of CO₂ and brine through abandoned wellbores to degrade the water quality in overlying drinking water aquifers. Properties of the deep reservoir, abandoned wellbores, and aquifers are inherently uncertain, requiring many simulations with varying input parameters to bound uncertainty in model predictions. Detailed finite difference multiphase flow simulations of the entire geologic carbon storage system may take days or even weeks to run, so dividing the system into component models for the reservoir, wellbore, and aquifer, and developing fast surrogate models for each component can greatly accelerate risk quantification. A generic aquifer model was developed using a generative adversarial deep learning network and trained using a large synthetic dataset of numerical flow simulations. The input parameters were selected to cover a wide range of groundwater aquifer attributes and leakage scenarios. The deep learning model predicts the temporal and spatial distribution of dissolved salt and dissolved CO₂ in the aquifer, and compares well to the numerical simulation results. Once the model is loaded into memory, it runs in a fraction of a second, greatly accelerating the many runs necessary for risk quantification.

Machine Learning and Uncertainty Quantification for Earth System Modeling and Prediction. Dr. Jason Hou | PNNL

Bio: Dr. Z. Jason Hou is a Chief Data Scientist and Team Lead of the Earth System Data Science team at PNNL. He has 20+ years’ experience in developing and applying advanced artificial intelligence (AI), machine learning (ML), uncertainty quantification (UQ), and extreme event analysis approaches to advance fields of research in environmental management and remediation, climate extremes, land-atmosphere modeling, smart grids, petroleum exploration, carbon sequestration, renewable energy forecasting, and energy storage applications.

Abstract: Machine learning and uncertainty quantification (ML/UQ) techniques have been successfully used and have more potential to advance Earth system science by improving understanding, analysis, modeling, prediction, and decision making. There are strong needs and great opportunities using ML/UQ to better predict, process, analyze, and learn from large volumes of Earth systems data from a variety of sources such as remote sensing, in situ observations, citizen science, and high-fidelity physics-based numerical simulations of Earth systems. In this presentation, I will discuss some progress in developing and applying ML/UQ for mechanistic understanding of various Earth system components (e.g., land, atmospheric, and renewable energy processes), calibrating Earth system models with ML-surrogates, identifying and estimating climate forcing with inverse-problem setup, and predicting future behaviors of Earth system processes, particularly the extremes.

Thursday May 26, 2022 - Quantitative Ecology Fisheries

How food and temperature determine growth opportunities for Juvenile Chinook Salmon in the Elwha River. Dr. Martin Liermann, Dr. Aimie H. Fullerton and Sarah Morley, MS | NOAA

A parcel-scale quantitative sea level rise vulnerability analysis for Puget Sound. Dr. Ian Miller | Washington Sea Grant

How food and temperature determine growth opportunities for Juvenile Chinook Salmon in the Elwha River. Dr. Martin Liermann, Dr. Aimie H. Fullerton and Sarah Morley, MS | NOAA

Abstract: : Understanding fish populations dynamics (abundance, growth, survival) is an important yet difficult management task because their ecology is complex. Habitat-based models of fish densities are frequently used because they are easy to motivate and can be constructed using data that is readily available and easy to collect. However, fish are mobile organisms that are influenced by multiple environmental factors - including temperature, food availability, and the presence of potential predators or competitors. We use a rich set of biotic and physical data collected in the Elwha River over 20 years to show how mechanistic models of egg incubation, fish growth, and movement can be used to assess our understanding of how fish utilize freshwater systems. The Elwha River provides a unique opportunity because of the depth of available data, the presence of multiple ESA listed salmonids, and the relatively pristine condition of the system. Using Chinook salmon spawn timing, temperature data, and incubation and growth models we were able to predict the timing and size distribution of juvenile Chinook migrating past screw traps in the mainstem and a tributary. These predictions allowed us to derive important management information, like the percentage of fry migrants, and provided insight into our model assumptions and gaps in the available data. 

A parcel-scale quantitative sea level rise vulnerability analysis for Puget Sound. Dr. Ian Miller | Washington Sea Grant

Abstract: The availability of regional and local sea level rise (SLR) projections facilitates the integration of SLR-related hazard exposure into community planning processes.  Hazard exposure alone, though, paints an incomplete picture of the risks faced by communities.  Vulnerability assessment couples hazard exposure information with the spatial distribution of valued community assets and their sensitivity to those hazards and provides better insights about areas that are most at risk due to SLR-driven hazards.  A careful and comprehensive assessment of SLR-driven vulnerability, therefore, can lead to more nuanced planning and decision-making and support more equitable distributions of resources and investment intended to reduce vulnerability.  Vulnerability assessments, though, often rely on convening stakeholder working groups, and are therefore expensive, time-consuming, and limited spatially to the zone of stakeholder expertise.  An alternative approach leverages the emergence of publicly available spatial data and analysis techniques to quantify SLR vulnerability at scales relevant to community planning and decision-making.  Here we report on methods and preliminary results from a quantitative SLR vulnerability assessment for Puget Sound in Washington State, intended to inform land-use, ecological restoration, and hazard planning.  The assessment calculated a vulnerability index for every parcel within a year 2100 SLR hazard zone, based both on the configuration of infrastructure and coastal habitats within the study area.  The results are also coupled with a concurrently developed social vulnerability index, which provides additional insight about those people and places that may be predisposed to adverse impacts from SLR-related hazards.  We find that the proposed approach offers advantages in terms of advancing equitable SLR-related risk reduction, but also that the results should be carefully interpreted considering embedded assumptions and data limitations. 

Thursday June 2, 2022 - Climate Modeling

Accelerating parameter learning within a climate model. Dr. Oliver Dunbar | California Institute of Technology

Improving Climate Models by Corrective Machine Learning. Dr. Chris Bretherton | AI2

Accelerating parameter learning within a climate model. Dr. Oliver Dunbar | California Institute of Technology

Bio: I am an applied mathematician, currently working in the position of Environmental Sciences and Engineering postdoctoral scholar at the California Institute of Technology. I work primarily in the Climate Modeling Alliance (CliMA) under Tapio Schneider and Andrew Stuart, using Bayesian methods and machine learning techniques to improve climate model predictions. My other interests span data assimilation, optimization, forward and inverse problems for physical, health, and social sciences.

Abstract: : Current state-of-the-art climate models produce uncertain predictions, but they are typically ill-equipped to quantify this uncertainty, evidenced by apparent variability of forecasts from competing models. The source of the uncertainty is from necessary simplified physical schemes used to represent small-scale dynamics or poorly understood physics. These schemes depend upon parameters that are calibrated (often by hand) to fit data, though there may be a wide distribution of parameters that feasibly produce a given piece of (noisy) data. In climate models, the range of parameters appearing in convection and turbulent schemes dominate the uncertainty of resulting decadal predictions; it is therefore essential to quantify it for robust prediction. This task is far more computationally intensive than parameter calibration, and historically has been out of reach of climate models.

We solve a suitable Bayesian inverse problem, which aims to learn a suitable parameter distribution from judiciously chosen time-averaged statistical data. We do this successfully by applying the new Calibrate-Emulate-Sample (CES) methodology. CES is based on three steps: a first calibration step, takes the climate model as a black box input, and is well adapted to derivative-free optimization in high performance computing architectures; a second Emulation step, automates, smooths, and accelerates calculation of the black box climate model by several orders of magnitude; a final Sampling step uses standard methods from computational statistics applied to the accelerated model to obtain a data-informed posterior distribution for parameters.

I will demonstrate this work within an idealized aquaplanet general circulation model, showing how parametric uncertainty quantification on the closure parameters for convection, can provide robust predictions of climate quantities. I shall also touch on other uses of parametric uncertainty, such as directing experimental design choices.


Improving Climate Models by Corrective Machine Learning. Dr. Chris Bretherton | AI2

Abstract and Bio to come.