Western Washington Data-Driven Discovery Seminar Series

Past Seminars

Spring 2022 Theme: Earth and Environmental Sciences

Thursday April 21, 2022 - Earth System Modeling and Scientific Machine Learning

A Risk Analysis Framework for Tropical cyclones (RAFT). Dr. Karthik Balaguru and Wenwei Xu, MS | Pacific Northwest National Laboratory

Bio: Karthik Balaguru is an Earth Scientist at Pacific Northwest National Laboratory. He received a Ph.D. in Physical Oceanography from Texas A&M University in 2011. He has broad interests in the areas of upper-ocean dynamics, air-sea interactions, and climate. Topics of particular interest are water cycle changes and their relationship with the climate extremes and the application of machine learning techniques to enhance earth system predictability.

Bio: Wenwei Xu is a Data Scientist at Pacific Northwest National Laboratory. He received a Masters’ degree in Civil & Environmental Engineering from Portland State University in 2014. Since then, he has developed an interest in applying Machine Learning techniques to solve earth systems and environmental-related problems. He has a broad research interest in Tropical Cyclone, GIS, and Computer Vision.

Abstract: "Tropical cyclones (TCs) or hurricanes are among the most destructive natural hazards in the global tropics and subtropics, with the capacity to impact millions of people annually. Even for the U.S. they pose a significant threat to the population and critical infrastructure in the coastal regions, making it important to characterize the risk associated with them and understand how they may evolve in a changing climate. While the reliable observed TC record is not long enough to robustly quantify storm behavior, direct simulation of TCs using high-resolution numerical models is computationally expensive. To overcome these challenges, a Risk Analysis Framework for Tropical Cyclones (RAFT) is being developed at PNNL to generate synthetic TCs on computers. RAFT is a hybrid modeling approach that combines physics-based models with machine learning to model not only the physical behavior of TCs but also the human-systems impacts associated with them.

The two specialized machine learning components of RAFT include a Feedforward Neural Network-based TC intensity model that predicts storm maximum surface wind and a Convolutional Neural Network-based TC rainfall model that estimates the quantity and spatial distribution of rainfall. The success of RAFT is a great example of breaking down a complex physical phenomenon into smaller components and combining both physics-based tools and machine learning-based tools to simulate extreme events more accurately and more efficiently."

Thursday April 28, 2022 - Climate and Extreme Weather

Some novel approaches to reduced-order climate modeling. Dr. Ben Kravitz | Indiana University

Bio: Dr. Ben Kravitz is an assistant professor in the Department of Earth and Atmospheric Sciences at Indiana University. He holds a B.A. in mathematics from Northwestern University, an M.S. in mathematics from Purdue University, and an M.S. and Ph.D. in atmospheric science from Rutgers University. He completed a postdoctoral research position at the Carnegie Institution for Science and another postdoctoral research position at Pacific Northwest National Laboratory, where he became a staff scientist in 2015. He joined the faculty at Indiana University in 2019, maintaining a joint appointment at Pacific Northwest National Laboratory. Dr. Kravitz is an international expert in climate model simulations of climate engineering. His current activities also include using engineering and mathematical techniques in climate models to better understand climate feedbacks, studying teleconnections in high latitude climate, and developing climate model emulators for use in Integrated Assessment Models.

Abstract: Climate models are our best mathematical representations of the real world, but they are very costly to run. Reduced-order modeling of the climate has long been used to make climate modeling more computationally tractable by reducing fidelity or complexity. Novel applications of computer science methods, including machine learning and climate networks, show pathways toward reduced order climate modeling that retains complexity but at reduced computational expense. I discuss three recent studies involving (1) using machine learning to improve short-term climate forecasts, (2) using climate networks to quantify Earth system teleconnections, and (3) another application of machine learning to generate numerous realizations of weather for use in quantifying extreme events.

Thursday May 5, 2022 - Natural Resource Modeling

Two buoys and a large ocean: Using state-of-art instruments to study offshore wind resource assessment. Dr. Raghu Krishnamurthy | PNNL

Mesoamerican Tree Species Composition Suggests Alternative Stable States . Dr. Hank Stevens | Miami University

Two buoys and a large ocean: Using state-of-art instruments to study offshore wind resource assessment. Dr. Raghu Krishnamurthy | PNNL

Bio: Dr. Krishnamurthy joined PNNL in 2019, prior to that he was working as an assistant professor at University of Notre Dame. He currently is a mentor for the Doppler lidar network at ARM and is a PI for two wind energy projects, Lidar buoy science and Wind Forecasting Improvement Project 3, funded by DOE Wind Energies Technologies Office. His research interests are focused on using observations and modeling to improve our understanding of the atmospheric boundary layer.

Abstract: In recent years, there has been rapidly increasing interest in the siting, buildout and efficient operation of offshore wind plants, as required to meet domestic renewable energy targets, i.e., 35% of the nation's electricity demands by 2050, of which 110 GW are expected to be deployed offshore. With a recent push to deploy 30 GW by 2030, quantifying the uncertainty in wind resource characterization is a top priority of the US DOE offshore wind strategy. Herein we will provide some details about unique offshore measurements that PNNL is collecting using two lidar-equipped buoys and how we have been using these measurements to improve our understanding of the offshore wind resource.

Mesoamerican Tree Species Composition Suggests Alternative Stable States . Dr. Hank Stevens | Miami University

Abstract: Detecting and confirming alternative stable states in naturally occurring dynamical systems is challenging, and in this talk, I describe several lines of circumstantial evidence that remnant garden forests of the ancient Maya are one such state in Mesoamerican forests. These ancient garden forests are found in areas of once high density Maya settlements, and they are characterized by relatively high abundances of once-culturally important woody species. We hypothesize that these forests persist as an alternative stable state that is maintained by positive feedbacks between the unusually high density of animal-dispersed tree species and their frugivore dispersers. We evaluated a series of predictions using spatially-explicit tree census data and phenological information gleaned from the literature. Our results show that Maya garden tree species are more likely than non-garden species to be animal-dispersed, and that many of these species showed high levels of spatial clustering indicative of short dispersal distances, but only within garden forests. We also found that garden forests were more likely to replace themselves, suggesting more stable community dynamics than in the surrounding forest matrix. Last, we found that phenological patterns of aggregate community-level fruiting potential by garden species were more likely to support frugivore populations than fruiting by non-garden species. Collectively, these findings suggest that the garden forests left behind by the ancient Maya are dynamical attractors that have helped maintain these forests for over 1000 years since the collapse of the Mayan civilization. If so, this would be a unique type of a long-term human landscape legacy, and would add to our growing appreciation that our most wild primeval landscapes have often harbored and been managed by our ancestors.

Thursday May 12, 2022 - Geointelligence and Remote Sensing

GeoAI for Earth Observation Analytics: Applications and Challenges. Andre Coleman, MS and Troy Saltiel, MS | PNNL

Bio: André Coleman has served as a senior research/data scientist at the Pacific Northwest National Laboratory (PNNL) in Richland, Washington, USA since 2000. He brings 27-years of professional experience in the fields of geoinformatics, hazard informatics, geointelligence, hydrology, bioenergy, and computer science. His research interests are focused in spatial and numerical modeling, remote-sensing (satellite, airborne, UAS), rapid automated disaster assessments, machine/deep learning and evolutionary computing, heterogeneous data fusion, water security, and coupling of spatial and physics-based numerical models. To date, Andre has authored or co-authored 108 publications, including 45 peer-reviewed journal articles and 3 book chapters.

Bio: Troy Saltiel is a post master’s research associate who joined PNNL in 2020. His work focuses on geographic information science and remote sensing, often involving geospatial big data like highly detailed multispectral and hyperspectral imagery, LiDAR data, and national-scale datasets. Recent examples include analysis of historical fire weather data, spaceborne mapping of aerially deployed fire retardant using machine learning, land screening for suitable algae biofuel sites, vegetation characterization using UAV-collected RGB imagery and LiDAR data, and species-level vegetation mapping with hyperspectral data. Troy graduated from the University of Utah with an MS in geography in 2021.

Abstract: From targeted remote sensing applications via uncrewed aerial systems to global collections via rapidly growing microsatellite constellations, the volume, velocity and variety of big earth observation data are growing at an unprecedented rate. GeoAI is an interdisciplinary field that bridges geographic sciences, remote-sensing, and data science to understand phenomena and derive actionable data in the natural and human environment; this field presents a unique challenge space, where spatial, spectral, and temporal have inherent relationships, and are not mutually exclusive. Current and emerging state-of-the-art methods show encouraging applicability to a diverse range of remote sensing applications, furthering our ability to effectively utilize Big Earth Data. The challenges and opportunities to realizing these benefits, however, differ from other more common computer vision problems. Generalizable models provide an apex for applied use, however for GeoAI, the field is still focused on tailored solutions that explore the art of the possible. This talk will present several examples of GeoAI applications at PNNL and address current challenges and opportunities related to generalization and transferability of GeoAI models.

Thursday May 19, 2022 - Earth System Modeling

Accelerating Geologic Carbon Storage Leakage Risk Quantification Using Deep Learning. Dr. Diana Bacon | PNNL

Machine Learning and Uncertainty Quantification for Earth System Modeling and Prediction. Dr. Jason Hou | PNNL

Accelerating Geologic Carbon Storage Leakage Risk Quantification Using Deep Learning. Dr. Diana Bacon | PNNL

Bio: Dr. Diana Bacon is a computational scientist with expertise in hydrology and geochemistry. Her research has focused on developing and applying multiphase flow and reactive transport simulators to understand the fate and transport of radionuclides, carbon and pollutants in groundwater. She is currently applying deep learning to develop fast forward simulators for pressure management during carbon storage operations as part of the U.S. Department of Energy’s SMART initiative and to the development of surrogate models of groundwater impacts related to CO₂ and brine leakage from CO₂ sequestration reservoirs for the U.S. Department of Energy’s National Risk Assessment Partnership (NRAP).

Abstract: Geologic carbon storage is a method of securing carbon dioxide (CO₂) in deep geologic formations to prevent its release to the atmosphere and contribution to global warming as a greenhouse gas. To obtain a permit to inject CO₂ into a deep saline reservoir in United States, a site operator is required to develop a simulation of CO₂ injection at the site and determine the “area of review” where there is potential for leakage of CO₂ and brine through abandoned wellbores to degrade the water quality in overlying drinking water aquifers. Properties of the deep reservoir, abandoned wellbores, and aquifers are inherently uncertain, requiring many simulations with varying input parameters to bound uncertainty in model predictions. Detailed finite difference multiphase flow simulations of the entire geologic carbon storage system may take days or even weeks to run, so dividing the system into component models for the reservoir, wellbore, and aquifer, and developing fast surrogate models for each component can greatly accelerate risk quantification. A generic aquifer model was developed using a generative adversarial deep learning network and trained using a large synthetic dataset of numerical flow simulations. The input parameters were selected to cover a wide range of groundwater aquifer attributes and leakage scenarios. The deep learning model predicts the temporal and spatial distribution of dissolved salt and dissolved CO₂ in the aquifer, and compares well to the numerical simulation results. Once the model is loaded into memory, it runs in a fraction of a second, greatly accelerating the many runs necessary for risk quantification.

Machine Learning and Uncertainty Quantification for Earth System Modeling and Prediction. Dr. Jason Hou | PNNL

Bio: Dr. Z. Jason Hou is a Chief Data Scientist and Team Lead of the Earth System Data Science team at PNNL. He has 20+ years’ experience in developing and applying advanced artificial intelligence (AI), machine learning (ML), uncertainty quantification (UQ), and extreme event analysis approaches to advance fields of research in environmental management and remediation, climate extremes, land-atmosphere modeling, smart grids, petroleum exploration, carbon sequestration, renewable energy forecasting, and energy storage applications.

Abstract: Machine learning and uncertainty quantification (ML/UQ) techniques have been successfully used and have more potential to advance Earth system science by improving understanding, analysis, modeling, prediction, and decision making. There are strong needs and great opportunities using ML/UQ to better predict, process, analyze, and learn from large volumes of Earth systems data from a variety of sources such as remote sensing, in situ observations, citizen science, and high-fidelity physics-based numerical simulations of Earth systems. In this presentation, I will discuss some progress in developing and applying ML/UQ for mechanistic understanding of various Earth system components (e.g., land, atmospheric, and renewable energy processes), calibrating Earth system models with ML-surrogates, identifying and estimating climate forcing with inverse-problem setup, and predicting future behaviors of Earth system processes, particularly the extremes.

Thursday May 26, 2022 - Quantitative Ecology Fisheries

How food and temperature determine growth opportunities for Juvenile Chinook Salmon in the Elwha River. Dr. Martin Liermann, Dr. Aimie H. Fullerton and Sarah Morley, MS | NOAA

A parcel-scale quantitative sea level rise vulnerability analysis for Puget Sound. Dr. Ian Miller | Washington Sea Grant

How food and temperature determine growth opportunities for Juvenile Chinook Salmon in the Elwha River. Dr. Martin Liermann, Dr. Aimie H. Fullerton and Sarah Morley, MS | NOAA

Abstract: : Understanding fish populations dynamics (abundance, growth, survival) is an important yet difficult management task because their ecology is complex. Habitat-based models of fish densities are frequently used because they are easy to motivate and can be constructed using data that is readily available and easy to collect. However, fish are mobile organisms that are influenced by multiple environmental factors - including temperature, food availability, and the presence of potential predators or competitors. We use a rich set of biotic and physical data collected in the Elwha River over 20 years to show how mechanistic models of egg incubation, fish growth, and movement can be used to assess our understanding of how fish utilize freshwater systems. The Elwha River provides a unique opportunity because of the depth of available data, the presence of multiple ESA listed salmonids, and the relatively pristine condition of the system. Using Chinook salmon spawn timing, temperature data, and incubation and growth models we were able to predict the timing and size distribution of juvenile Chinook migrating past screw traps in the mainstem and a tributary. These predictions allowed us to derive important management information, like the percentage of fry migrants, and provided insight into our model assumptions and gaps in the available data.

A parcel-scale quantitative sea level rise vulnerability analysis for Puget Sound. Dr. Ian Miller | Washington Sea Grant

Abstract: The availability of regional and local sea level rise (SLR) projections facilitates the integration of SLR-related hazard exposure into community planning processes. Hazard exposure alone, though, paints an incomplete picture of the risks faced by communities. Vulnerability assessment couples hazard exposure information with the spatial distribution of valued community assets and their sensitivity to those hazards and provides better insights about areas that are most at risk due to SLR-driven hazards. A careful and comprehensive assessment of SLR-driven vulnerability, therefore, can lead to more nuanced planning and decision-making and support more equitable distributions of resources and investment intended to reduce vulnerability. Vulnerability assessments, though, often rely on convening stakeholder working groups, and are therefore expensive, time-consuming, and limited spatially to the zone of stakeholder expertise. An alternative approach leverages the emergence of publicly available spatial data and analysis techniques to quantify SLR vulnerability at scales relevant to community planning and decision-making. Here we report on methods and preliminary results from a quantitative SLR vulnerability assessment for Puget Sound in Washington State, intended to inform land-use, ecological restoration, and hazard planning. The assessment calculated a vulnerability index for every parcel within a year 2100 SLR hazard zone, based both on the configuration of infrastructure and coastal habitats within the study area. The results are also coupled with a concurrently developed social vulnerability index, which provides additional insight about those people and places that may be predisposed to adverse impacts from SLR-related hazards. We find that the proposed approach offers advantages in terms of advancing equitable SLR-related risk reduction, but also that the results should be carefully interpreted considering embedded assumptions and data limitations.

Thursday June 2, 2022 - Climate Modeling

Accelerating parameter learning within a climate model. Dr. Oliver Dunbar | California Institute of Technology

Improving Climate Models by Corrective Machine Learning. Dr. Chris Bretherton | AI2

Accelerating parameter learning within a climate model. Dr. Oliver Dunbar | California Institute of Technology

Bio: I am an applied mathematician, currently working in the position of Environmental Sciences and Engineering postdoctoral scholar at the California Institute of Technology. I work primarily in the Climate Modeling Alliance (CliMA) under Tapio Schneider and Andrew Stuart, using Bayesian methods and machine learning techniques to improve climate model predictions. My other interests span data assimilation, optimization, forward and inverse problems for physical, health, and social sciences.

Abstract: : Current state-of-the-art climate models produce uncertain predictions, but they are typically ill-equipped to quantify this uncertainty, evidenced by apparent variability of forecasts from competing models. The source of the uncertainty is from necessary simplified physical schemes used to represent small-scale dynamics or poorly understood physics. These schemes depend upon parameters that are calibrated (often by hand) to fit data, though there may be a wide distribution of parameters that feasibly produce a given piece of (noisy) data. In climate models, the range of parameters appearing in convection and turbulent schemes dominate the uncertainty of resulting decadal predictions; it is therefore essential to quantify it for robust prediction. This task is far more computationally intensive than parameter calibration, and historically has been out of reach of climate models.

We solve a suitable Bayesian inverse problem, which aims to learn a suitable parameter distribution from judiciously chosen time-averaged statistical data. We do this successfully by applying the new Calibrate-Emulate-Sample (CES) methodology. CES is based on three steps: a first calibration step, takes the climate model as a black box input, and is well adapted to derivative-free optimization in high performance computing architectures; a second Emulation step, automates, smooths, and accelerates calculation of the black box climate model by several orders of magnitude; a final Sampling step uses standard methods from computational statistics applied to the accelerated model to obtain a data-informed posterior distribution for parameters.

I will demonstrate this work within an idealized aquaplanet general circulation model, showing how parametric uncertainty quantification on the closure parameters for convection, can provide robust predictions of climate quantities. I shall also touch on other uses of parametric uncertainty, such as directing experimental design choices.

Improving Climate Models by Corrective Machine Learning. Dr. Chris Bretherton | AI2

Abstract: Machine learning (ML) is used to correct the physical parameterizations of a real-geography coarse-grid global atmosphere model of grid spacing 25-200 km to make the model evolve more like a reference, either a reanalysis or a fine-grid global simulation. We run training simulations in which the temperature and humidity of the target coarse model are nudged to the reference on a 3-hour time scale. The nudging tendencies and (optionally) the downwelling surface radiative fluxes are learned as functions of column thermodynamic state, insolation, and terrain height. The learned tendencies are used in forecasts to correct the combined physical parameterization tendencies. We show that corrective ML can significantly improve both weather forecasts and time-mean geographic distributions of land surface temperature and precipitation, even across multiple climates

Bio: Chris Bretherton is the Senior Director of Climate Modeling at AI2, where he leads a research group using machine learning to improve climate models, in collaboration with NOAA’s Geophysical Fluid Dynamics Laboratory in Princeton. From 1985-2021 he was a professor of atmospheric science and applied mathematics at the University of Washington, studying cloud formation, turbulence, and how to better represent them in global climate and weather forecast models. He was a lead author of the IPCC Fifth Assessment Report in 2013. In 2012, he received the Jule G. Charney Award, from the American Meteorological Society, and he was the 2019 AMS Haurwitz Lecturer. He is a Fellow of the AMS and AGU, and a member of the National Academy of Sciences and Washington State Academy of Sciences.

Fall 2021 Theme: Chemical and Biological Sciences

Thursday October 14, 2021

AI at PNNL. Court Corley | Pacific Northwest National Laboratory

Bio: Court joined PNNL in 2009 as a post-doctoral research associate and is now a Chief Data Scientist and group leader for Data Sciences and Analytics in the Computing & Analytics Division. He is a leader in the field of data science and biosurveillance. His current work focuses on deep learning and narrow AI methods and on computational modeling for biosurveillance. Court co-led the Deep Learning for Scientific Discovery agile LDRD initiative that has applied deep learning across the breadth of PNNL’s science and security missions. He is active in several professional societies, leads data science working groups, and regularly performs peer reviews for journals, conferences, and funding bodies.

Abstract: PNNL has developed foundational expertise and forward-learning capability in artificial intelligence (AI) through both internal investments and sponsored programs across all our mission areas. PNNL’s core AI capabilities are focused on machine reasoning and are summarized into two categories. The first is applying AI technologies to the breadth of our science and technology domains. The second is unique specializations in AI that address grand scientific challenges in support of the nation. The specializations include assured and trustworthy AI, scalable AI architectures, and learning with limited data.

Thursday October 21, 2021

Harnessing the power of ‘omics in rare disease: lessons from building cohorts, model systems, and algorithms in neurofibromatosis 1. Sara Gosline | Pacific Northwest National Laboratory

Detection and analysis of amino acid insertions and deletions. Muneeba Jilani | University of Massachusetts Boston

Harnessing the power of ‘omics in rare disease: lessons from building cohorts, model systems, and algorithms in neurofibromatosis 1. Sara Gosline | Pacific Northwest National Laboratory

Bio: Sara Gosline received a BA in computer science from Columbia University and spent two years working in the software field before returning to graduate school full-time. She received her master’s degree and PhD in computer science from McGill University, with a specialty in bioinformatics. Then, she moved to the Massachusetts Institute of Technology where she worked as a postdoctoral researcher in the Department of Biological Engineering with Ernest Fraenkel and Phillip Sharp; there, she focused on employing computational algorithms to disentangle biological data from cancer and other diseases.

Since completing her training, she worked at Sage Bionetworks, a non-profit that focuses on accelerating the pace of biomedical research through enabling data sharing and collaboration. She focused on supporting scientific discoveries in rare diseases. Gosline has since become a research scientist at PNNL where she continues her research on rare diseases, as well as working with other large molecular datasets, specifically trying to employ novel algorithms in cancer and other diseases.

Abstract: Advancements in biotechnology have led to a deluge of data in biomedical research which include individual measurements of genes, messenger RNAs, and proteins across tissues and patients. With enough patient samples, we can use these measurements to identify specific ‘biomarkers’ of a disease – genes, mRNA, or proteins that indicate disease. However, in cases of rare disease, samples are hard to come by, and it is difficult to build datasets that are large enough to identify biomarkers with sufficient statistical power. This is especially pressing in the field of Neurofibromatosis 1 (NF1), a monogenic syndrome that gives rise to benign tumors (among other symptoms) for which there is no clinically approved treatment. In this talk I will describe recent efforts to overcome the paucity of data to identify putative drug target sin NF1 by working directly with patient communities to collect data, working with biological engineers to build better models of rare disease, and developing novel computational approaches.

Detection and analysis of amino acid insertions and deletions. Muneeba Jilani | University of Massachusetts Boston

Bio: I am a PhD computational sciences candidate at UMass, Boston. Applying my software engineering academic background to biological applications is my main goal during the PhD program.

Abstract: Despite being a recurring type of sequence variation, amino acid insertions and deletions, their source and resulting functional significances from them remain rather unexplored area of structural biology. Recent research endeavors have made it apparent that this kind of structural variations have a stronger correlation with functional changes in the respective proteins compared to the other kinds of mutations. In the upcoming talk, we will overview various aspects of InDels including the origin of InDels, their detection from the protein sequence and various methods of analyzing InDels.

Thursday October 28, 2021

Exploring the conformational landscape of tau peptide with molecular dynamics simulations. Jay McCarthy | Western Washington University

Visualizing the true structure of big data for data exploration. Kevin Moon | Utah State University

Exploring the conformational landscape of tau peptide with molecular dynamics simulations. Jay McCarthy | Western Washington University

Bio: James McCarty is an Assistant Professor of Chemistry at Western Washington University. He holds a B.S. in biochemistry from the California Polytechnic State University in San Luis Obispo, CA, and a Ph.D. in physical chemistry from the University of Oregon. Following graduate school, he worked as a postdoctoral fellow in the research group of Michele Parrinello at ETH Zurich, Switzerland, where he worked on the development of new methods for studying rare events from atomistic simulations. Prior to his current appointment at Western, he held a joint postdoctoral appointment in the Department of Chemistry and Biochemistry and the Materials Research Laboratory at the University of California Santa Barbara. At UCSB he worked to develop computational methods to study the aggregation of intrinsically disordered proteins. His current research interests include molecular dynamics simulations, enhanced sampling of rare events, and protein biophysics.

Abstract: The neuronal protein tau is an intrinsically disordered protein that normally associates with microtubules. Misfolding and aggregation of tau into neurofibrillary tangles that are toxic to the cell is a hallmark of several neurodegenerative diseases, including Alzheimer’s disease (AD). Recent experiments suggest that tau neurofibrillary tangles spread through the brain via a prion-like seeding mechanism, where small oligomeric tau protofibrils form a template for the folding of endogenous tau into its pathological form. A detailed understanding of the specific physical mechanism of templated tau folding could lead to new insights into the seeding mechanism and spread of AD. In this talk, I will present results from extensive molecular dynamics of two isoforms of tau in solution. Our results show that tau in solution exists in dynamic equilibrium of distinct conformations with different probability. A major hurdle in studying tau fibril growth with molecular simulations is the need to simulate a large system with many conformational degrees of freedom for a long enough time to exhaustively sample phase space. We are working to overcome this limitation though the use of enhanced sampling methods including umbrella sampling and metadynamics. I will present our simulation procedure for studying tau fibril growth and our progress towards understanding the intermolecular interactions that stabilize the tau protofibril.

Visualizing the true structure of big data for data exploration. Kevin Moon | Utah State University

Bio: Kevin Moon is an assistant professor in the Department of Mathematics and Statistics at Utah State University (USU). He holds a B.S. and M.S. degree in electrical engineering from Brigham Young University and an M.S. degree in mathematics and a Ph.D. in electrical engineering from the University of Michigan. Prior to joining USU in 2018, he was a postdoctoral scholar (2016-2018) in the Genetics Department and the Applied Mathematics Program at Yale University. His research interests are in the development of theory and applications in machine learning, big data, information theory, deep learning, and manifold learning.

Abstract: We live in an era of big data in which researchers in nearly every field are generating thousands or even millions of samples in high dimensions. Most methods in data science focus on prediction or impose restrictive assumptions that require established knowledge and understanding of the data; i.e. these methods require some level of expert supervision. However, in many cases, this knowledge is unavailable and the goal of data analysis is scientific discovery and to develop a better understanding of the data. There is especially a strong need for methods that perform unsupervised data visualization that accurately represents the true structure of the data, which is crucial for developing intuition and understanding of the data. In this talk, I will present PHATE: an unsupervised data visualization tool based on a new information distance that excels at denoising the data while preserving both global and local structure. I will demonstrate PHATE on a variety of datasets including facial images and new single-cell RNA-sequencing data. In addition, I will present extensions for visualizing dynamical systems and supervised problems.

Thursday November 4, 2021

Modeling luminescent solar concentrators. Steve McDowall | Western Washington University

Leveraging structured biological knowledge for counterfactual inference. Jeremy Zucker | Pacific Northwest National Laboratory

Modeling luminescent solar concentrators. Steve McDowall | Western Washington University

Bio: Steve McDowall studied mathematics in New Zealand and then at the University of Washington where he earned his PhD. Since then, he was a visiting assistant professor at the University of Rochester, was a research fellow at the Mathematical Science Research Institute and has been on the faculty at Western since winter of 2002. His research focuses around inverse problems in partial differential equations, which is the mathematics behind imaging methods such as MRI and CAT scans and has more recently worked on collaborations with physicists and chemists to develop new technology for harvesting solar energy.

Abstract: A luminescent solar concentrator (LSC) is a large glass panel which collects sunlight over its area and concentrates the light to the edges where a thin strip of photovoltaics converts the energy into electricity. Faculty and students in the chemistry department synthesize the molecules used in LSCs and make small prototypes. To measure their performance, we illuminate the LSC with a laser at varying distances from an edge and collect and measure the light coming out from that edge. We do this over a wide range of wavelengths and at around 5 distances. From this data we wish to determine certain physical characteristics of the LSC made. In particular, we seek the degree of Raman scattering, attenuation due to other imperfections, and the concentration of the molecules within the host matrix. To estimate these parameters, I have developed a stochastic model which takes into account the many complicated physical laws involved in photon transport through such an LSC. The challenge then becomes finding parameters for which the predicted output best fits the measurements. I’ll briefly tell you the story of the development of this technology at Western and introduce the mathematics involved in the modeling.

Leveraging structured biological knowledge for counterfactual inference. Jeremy Zucker | Pacific Northwest National Laboratory

Abstract: Counterfactual inference is a useful tool for comparing outcomes of interventions on complex systems. It requires us to represent the system in the form of a structural causal model, complete with a causal diagram, probabilistic assumptions on exogenous variables, and functional assignments. Specifying such models can be extremely difficult in practice. The process requires substantial domain expertise, and does not scale easily to large systems, multiple systems, or novel system modifications. At the same time, many application domains, such as molecular biology, are rich in structured causal knowledge that is qualitative in nature. This presentation proposes a general approach for querying a causal biological knowledge graph, and converting the qualitative result into a quantitative structural causal model that can learn from data to answer the question. We demonstrate the feasibility, accuracy and versatility of this approach using two case studies in systems biology. The first demonstrates the appropriateness of the underlying assumptions and the accuracy of the results. The second demonstrates the versatility of the approach by querying a knowledge base for the molecular determinants of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)-induced cytokine storm, and performing counterfactual inference to estimate the individual treatment effect of medical countermeasures for severely ill patients.

Thursday November 18, 2021

Quantitative analysis of animal coat patterns: an example of the melanistic spot patterns of leopard geckos. Tilmann Glimm | Western Washington University

Solving and learning phase field models using the Physics Informed Neural Networks. Jia Zhao | Utah State University

Quantitative analysis of animal coat patterns: an example of the melanistic spot patterns of leopard geckos. Tilmann Glimm | Western Washington University

Bio: Tilmann Glimm is a professor of mathematics at Western Washington University. His main research area is mathematical biology, in particular pattern formation in development, using methods such as partial differential equations and agent-based models.

Abstract: Animal color patterns are widely studied in ecology, evolution, and through mathematical modeling. As large amounts of photographic data is becoming more easily available, there is a growing need for general quantitative methods for capturing and analyzing the full complexity and details of pattern variation within and between animals. I will talk about an approach to capture and analyze variation in melanistic pattern elements in leopard geckos. We compare patterns using 14 indices such as the ratio of melanistic versus total area, the ellipticity of spots, and the size of spots and use these to define a composite distance between two patterns. We demonstrate how this can be used to quantify within-individual and between-individual correlations. These and other measures of variation can be used to draw inferences about pattern development and the role of random noise in pattern variation. The approach can be applied to other organisms to study variation in color patterns between body parts and to address questions of pattern formation in animals.

Solving and learning phase field models using the Physics Informed Neural Networks. Jia Zhao | Utah State University

Bio: Dr. Jia Zhao received his Ph.D. in computational and applied mathematics from the University of South Carolina at Columbia back in 2015. Then he worked as a Postdoc at the University of North Carolina at Chapel Hill during 2015-2017. He joined Utah State University as an Assistant Professor in 2017. His research focuses on computational modeling of complex multiphase fluids with applications in life science. His research projects have been funded by multiple resources, including NSF, NIH, NVDIA, and S&P Global Ratings.

Abstract: Phase field models, including the Allen-Cahn type and Cahn-Hilliard type equations, have been widely used to investigate interfacial dynamic problems. Designing accurate, efficient, and stable numerical algorithms for solving the phase field models has been an active field for decades. In the meanwhile, developing reliable and physically consistent phase field models for applications in science and engineering have also been intensively investigated.

In this talk, we introduce some preliminary results on solving and learning phase field models using deep neural networks. In the first part, we focus on using the deep neural network to design an automatic numerical solver for the Allen-Cahn and Cahn-Hilliard equations by proposing an adaptive physics informed neural network (PINN). In particular, we propose to embrace the adaptive idea in both space and time and introduce various sampling strategies, such that we are able to improve the efficiency and accuracy of the PINN on solving phase field equations. In the second part, we introduce a new deep learning framework for discovering the phase field models from existing image data. The new framework embraces the approximation power of physics informed neural networks (PINN), and the computational efficiency of the pseudo-spectral methods, which we named pseudo-spectral PINN or SPINN. We will illustrate its approximation power by some interesting examples.

2020-2021 Series

Thursday May 27, 2021

Presence of Hysteresis in Real Gross Domestic Product (RGDP) and Resulting Model Effects. Ian Callen (Montana State University-Bozeman)

Predicting Stellar Ages with Deep Learning. Aidan McBride (WWU)

Thursday May 20, 2021

Few-Shot Learning with Audio Classification and Sound Event Detection. Piper Wolters (WWU)

Few-Shot Learning for Video Action Recognition and Temporal Action Localization. Chris Daw (WWU)

Few-Shot Learning with Audio Classification and Sound Event Detection. Piper Wolters (WWU)

Recent advances in the deep learning field have resulted in state-of-the-art performance for various audio classification tasks, but unlike humans, machines traditionally require large amounts of data to correctly classify. Few-shot learning refers to machine learning methods in which the model is able to generalize to new classes with very few training examples. In this research, we address speaker identification and audio segment classification with the Prototypical Network few-shot learning algorithm. We systematically compare the key architectural decision: the encoder, which performs feature extraction on the raw data. Our encoders include recurrent neural networks, as well as one- and two-dimensional convolutional neural networks. For a 5-way speaker identification task on the VoxCeleb dataset, with only five training examples per speaker, our best model obtains 94.9% accuracy. On a 5-way audio classification task using the Kinetics 600 dataset of Youtube videos, with only five examples per class, we obtain 49.0% accuracy. We are currently extending this work to few-shot audio event detection and speaker identification, so that audio events and speakers can be detected in long audio documents, with minimal supervision.

Few-Shot Learning for Video Action Recognition and Temporal Action Localization. Chris Daw (WWU)

Unlike humans, current deep learning techniques require large quantities of labeled data. Recently, effort has been put towards developing deep learning models that can make accurate predictions in a low-data regime; this body of work has been coined "few-shot learning." In the few-shot scenario, a learner must effectively generalize to unseen classes given a small support set of labeled examples. While a relatively large amount of research has gone into few-shot learning for image classification, little work has been done investigating the application of few-shot learning to video classification and temporal action localization. In this work, we address the tasks of few-shot video action recognition and its related task, event localization. We perform action recognition with a set of two-stream models. We evaluate the classification performance of a set of convolutional and recurrent neural network video encoder architectures used in conjunction with three popular metric-based few-shot algorithms. This work also discusses temporal action localization using a ResNet-18 to embed video frames and a prototypical network to temporally localize events within an untrimmed query video.

Thursday May 13, 2021

Machine Learning for Adaptive RAS/SPS System Settings in Power System Operation and Control. Xiayuan Fan (Senior Research Engineer, PNNL)

Constrained Deep Learning for Modeling and Control of Unknown Dynamical Systems. Jan Drgona (Data Scientist, PNNL)

Machine Learning for Adaptive RAS/SPS System Settings in Power System Operation and Control Xiayuan Fan (Senior Research Engineer, PNNL)

Given the increasing penetration of renewable energy, demand response and smart controllers in today’s power grid, atypical power flow patterns such as reverse flows, loop flows, and stochastic dynamic behavior are being observed in real-time operation. These new patterns may invalidate the existing protection relay settings, especially Remedial Action Scheme (RAS), also known as Special Protection Scheme (SPS), which when invalidated can potentially cause cascading failures if the operational issues caused by the new challenges are not fully understood and addressed.

Traditionally, RAS settings have been determined using offline study, which is very time-consuming, due to a lack of automation and computational power. No automated tools exist to assist planning and protection engineers in adaptively determining RAS settings to enable better response to unknown grid conditions. Thus, these settings are typically overly conservative, causing unnecessary flow curtailment or generation tripping that can affect revenue of generator owners and economic operation of the entire power network.

Several challenges are identified in the industry preventing the RAS settings from being determined in an adaptive/online manner. One key issue is that the computational speed is not fast enough in today’s commercial tools to perform a full-scale study to calculate RAS parameters such as the arming level and validate the control performance in a preventive way.

The Pacific Northwest National Laboratory (PNNL) project team has developed innovative mathematical and advanced computing methods for adaptively setting Remedial Action Scheme/Special Protection Scheme (RAS/SPS) coefficients with the consideration of realistic and near real-time operation conditions. In this DOE-funded project, the Jim Bridger RAS in U.S. Western Interconnection served as the use case for testing and validating the proposed methodology and the corresponding prototype, Transformative Remedial Action Scheme Tool (TRAST).

TRAST uses a novel approach to generate use cases automatically, bringing in advanced statistical data analysis tools, and using machine learning algorithms to analyze, validate, and help create RAS plans. The parallel computing platform at PNNL, as well as Microsoft cloud environment, are utilized for steady state and dynamic simulations under massive contingencies and operating conditions to calculate more realistic settings of RAS systems in a real-world environment. TRAST could significantly simplify and shorten the RAS design and study process. Additionally, continuous improvement and validation can be achieved using the proposed evaluation methodology.

Both grid operators and utility planning engineers will benefit from this technology, and a better RAS modeling process will also increase interconnection-level situational awareness.

Constrained Physics-Informed Deep Learning for Stable System Identification and Continuous Control, Jan Drgona (Data Scientist, PNNL)

We present a novel data-driven method for learning deep constrained continuous control policies and dynamical models of the controlled system. By leveraging partial knowledge of system dynamics and constraint enforcing multi-objective loss functions, the method can learn from small and static datasets, handle time-varying state and input constraints and enforce the stability properties of the controlled system.

We use a continuous control design example to demonstrate the performance of the method on three distinct tasks: system identification, control policy learning, and simultaneous system identification and policy learning. We assess the system identification performance by comparing open-loop simulations of the true system and the learned models. We demonstrate the performance of the policy learning methodology on closed-loop control performance using the ground truth system model under varying levels of parametric and additive uncertainties affecting its dynamics.

We then evaluate the potential of simultaneously learning the system model and control policy. Our empirical results demonstrate the effectiveness of our unifying framework for constrained continuous optimal control to provide stability guarantees, explainable models, robustness to uncertainty, and remarkable sampling efficiency.

Thursday May 6, 2021

Social Networks and College Performance: Evidence from Dining Data. John Krieg (Professor of Economics, Western Washington University)

Power Curve Simulation Study of Nonparametric Multiple Contrast Testing Procedures for One-Way Repeated Measures Experimental Design. Patrick Caroll (Oregon State University)

Social Networks and College Performance: Evidence from Dining Data, John Krieg (Professor of Economics, Western Washington University)

We investigate the effect of friends on academic performance in college using unique data on dining card swipes at a medium-sized public university. Using several different time- and frequency-bandwidth measures, we deﬁne friendships by academic quarter as repeated meetings among students in the same dining hall. To identify the impact of having a friend in class, we employ models using student- and class-level ﬁxed effects as well as a number of controls to rule out alternative explanations. Our results suggest that having a friend in class has a large and positive effect on grades, and this effect is consistent regardless of a friend’s background characteristics.

Power Curve Simulation Study of Nonparametric Multiple Contrast Testing Procedures for One-Way Repeated Measures Experimental Design. Patrick Caroll (Oregon State University)

Repeated measures experimental designs frequently arise in many fields of study, from medicine to psychology to policy making, where multiple observations are taken from the same set of subjects. The use of rank-based nonparametric methods is commonly recommended for analyzing effects in these designs as the observed data typically have non-normal distributions with possibly a number of outliers.

In this study, the sizes and powers of the recent nonparametric multiple contrast testing procedures for the one-way repeated measures experimental design are compared empirically. The methods included in this study are the multivariate normaland t-based methods implemented in the mctp.rm() function of the nparcomp R package. In addition, the df-procedure presented in Hasler (2013) which suggests an alternative way of computing the degrees of freedom for the multivariate t-based method is examined.

Thursday April 29, 2021

Our approach to training, testing and validating machine learning models is broken. Rob Jasper (Research Manager, PNNL)

Software engineering for machine learning based (MLB) systems is different from the methods used to design and code traditional systems. Notably, common methods for verification and testing of MLB systems are very different from traditional systems. I argue that the near ubiquitous data science method of train, test, validation is fundamentally broken.

Research on adversarial machine learning highlights novel ways to both attack MLB systems and defend against those attacks. Some of these methods can be used to reveal hidden model flaws and highlight a lack of model robustness. I will highlight several ways in which we can use adversarial techniques to more fully test MLB systems.

Thursday April 22, 2021

Big data in astronomy: new solutions to old problems. Dr. Marina Kounkel (Postdoctoral Fellow Vanderbilt University)

Stellar spectra can reveal much about properties of stars, and, for over 100 years it has been used to characterize stellar evolution. However, despite this long standing history, common methods for analyzing spectra are inefficient and they produce significant systematics in the derived parameters. In the era of large surveys and precision astronomy, these age-old algorithms show their cracks, struggling to keep up with the volume of data, only to produce less than ideal results. Machine learning provides a new method for leveraging the historical knowledge to the new observations.

Marina is a Postdoctoral scholar at Vanderbilt University, previously Western Washington University. Her thesis was at the University of Michigan on measuring accurate positions of young stars in Orion through radio emission to measure its 3-dimensional structure. Since then, working with such surveys as Gaia and APOGEE, she has continued analyzing its kinematics and evolution and has branched out in characterization of other nearby star forming regions, as well as dynamics of Milky Way as a whole.

Thursday March 4, 2021

Where the Wild Data Are, Dr. Shameem Ahmed (Assistant Professor, Computer Science Department, Western Washington University)

Show me the Insight, Dr. Moushumi Sharmin (Associate Professor, Computer Science Department, Western Washington University)

Where the Wild Data Are, Dr. Shameem Ahmed (Assistant Professor, Computer Science Department, Western Washington University)

Advancement of smart and wearable technologies has enabled researchers to collect users' physiological (e.g., heart rate, sleep, step count), contextual (e.g., location, time of the day), experience (e.g., stress values, emotional states, physical states, social-engagement), and environmental data (e.g., sound, light) from the wild. Such objective, continuous, and contextual data has offered unique opportunities to identify important details about everyday life experiences and identify factors contributing to complex behavioral and health issues, which may guide the design of evidence-based interventions. Recently, researchers started using smart and wearable technology for the management of behavioral health issues in areas such as smoking and addiction prevention and anxiety and stress management among others. Research in these areas has inspired us to investigate the effectiveness of utilizing different data collection methodology that will enable collection of objective and subjective data from the wild to identify factors contributing to the challenges in the domain of autism spectrum disorder. In this talk, I will discuss the challenges in effectively collecting data from the wild and the varying methodologies that I have utilized that aims to assist in the design of evidence-based interventions for individuals with autism.

Show me the Insight, Dr. Moushumi Sharmin (Associate Professor, Computer Science Department, Western Washington University)

With the vast amounts of data available to us pertaining to every aspect of human life, we are observing an increased interest in using data visualization as an analytical tool. In the recent years, researchers reported success in the use of visualization techniques to identify potential solutions for critical problems such as stress management, addiction prevention, understanding spread of viruses and diseases, and designing assistive technology for neurodiverse users. Creating effective visualizations however presents many challenges due to the large volume of data, the need for analysis at different levels of granularity, the disparity among various sources of data, and the varying needs of the stakeholders. Despite these challenges, visualization researchers found that careful design choices and the use of intuitive interaction techniques can lead to the creation of effective visualizations. One important application of visualization tools is in the area of wearable and smart mobile health sensor data that presents unique challenges and opportunities to identify trends and patterns, provide important context to events of interest, and the relationships between these events. In this talk I will introduce a set of visualizations based on wearable sensor and mobile health data that aids in sense making, pattern identification, and long-term behavior change.

Thursday February 25, 2021

Use of Machine Learning to Analyze Chemistry Card Sort Tasks, Logan Sizemore (Western Washington University)

Fine-Grained Classroom Activity Detection, Eric Slyman (Pacific Northwest National Laboratory)

Use of Machine Learning to Analyze Chemistry Card Sort Tasks, Logan Sizemore (Western Washington University)

Education researchers are deeply interested in understanding the way students organize their knowledge, and card sort tasks, which require students to group concepts, are one mechanism to infer a student’s organizational strategy. However, the limited resolution of card sort tasks means they necessarily miss some of the nuance in a student’s strategy. In this work, we propose new machine learning strategies that leverage a potentially richer source of student thinking: free-form written language justifications of sorts. Using data from a university chemistry card sort task, we use vectorized representations of language and unsupervised learning techniques to generate qualitatively interpretable clusters, which can provide unique insight in how students organize their knowledge.

Fine-Grained Classroom Activity Detection, Eric Slyman (Pacific Northwest National Laboratory)

Instructors are increasingly incorporating student-centered learning techniques in their classrooms to improve learning outcomes. In addition to lecture, these class sessions involve forms of individual and group work, and greater rates of student-instructor interaction. Quantifying classroom activity is a key element of accelerating the evaluation and refinement of innovative teaching practices, but manual annotation does not scale. In this work, we present advances to the young application area of automatic classroom activity detection from audio. Using a university classroom corpus with nine activity labels (e.g., "lecture,'' "group work,'' "student question''), we propose and evaluate deep fully connected, convolutional, and recurrent neural network architectures, comparing the performance of mel-filterbank, prosodic, and self-supervised acoustic features. We compare 9-way classification performance with 5-way and 4-way simplifications of the task and assess two types of generalization: (1) new class sessions from previously seen instructors and (2) previously unseen instructors. We obtain strong results on the new fine-grained task and state-of-the-art on the 4-way task: our best model obtains frame-level error rates of 6.2%, 7.7% and 28.0% when generalizing to unseen instructors for the 4-way, 5-way and 9-way classification tasks, respectively (relative reductions of 35.4%, 48.3% and 21.6% over a strong baseline) and examine the effects of ensembleing and decoding its outputs. When estimating the aggregate time spent on classroom activities, our average root mean squared error is 1.64 minutes, a 54.9% relative reduction over the baseline.

Thursday February 18, 2021

Empirical Likelihood for Change Point Detection in Autoregressive Models, Ramadha Piyadi Gamage (Department of Mathematics, Western Washington University)

Change Point Detection Using the Likelihood Ratio Test, Arick Grootveld and Andrea Scolari (Western Washington University)

Empirical Likelihood for Change Point Detection in Autoregressive Models, Ramadha Piyadi Gamage (Department of Mathematics, Western Washington University)

Change point analysis has become an important research topic in many fields of applications. Several research works have been carried out to detect changes and its locations in time series data. A nonparametric method based on the empirical likelihood is proposed to detect structural changes in the parameters of autoregressive (AR) models. Under certain conditions, the asymptotic null distribution of the empirical likelihood ratio test statistic is proved to be Gumbel type. Further, the consistency of the test statistic is verified. Simulations are carried out to show that the power of the proposed test statistic is significant by using different time series models as well as comparing the proposed method to alternative methods. The proposed method is applied to monthly average soybean sales data to further illustrate the testing procedure.

Change Point Detection Using the Likelihood Ratio Test, Arick Grootveld and Andrea Scolari (Western Washington University)

Change point analysis has become a widely spread application in statistics. It can be used to answer questions like; "Has a given incident that happened in the world caused a statistically significant change in the stock market of the USA? Can we use a single model or multiple models, depending on the change(s) and change location(s)?”. There are several methods that have been used in the literature and we focus on the likelihood ratio approach. In this presentation, we discuss detecting change(s) in an independent data set using the likelihood ratio approach. The corresponding hypothesis test, test statistic, asymptotic distribution, and algorithm will be summarized in this talk.

Thursday February 11, 2021

Constrained Block Nonlinear Neural Dynamics Models, Elliott Skomski (Applied Statistics and Computational Modeling Group, PNNL). Slides. Publication

Nonlinear Control of Network Dynamical Systems, Megan Morrison (Applied Mathematics Department, UW) Slides. Publication

Robust Prediction in Spatio-Temporal Networks, Sai Pushpak Nandanoori (Optimization and Control Group, PNNL)

Constrained Block Nonlinear Neural Dynamics Models, Elliott Skomski (Applied Statistics and Computational Modeling Group, PNNL)

We present a novel formulation of neural state space models for data-efficient learning of deep control-oriented nonlinear dynamical models by embedding local model structure and optimization constraints. The proposed method consists of neural network blocks that represent input, state, and output dynamics with constraints placed on the network weights and system variables. For handling partially observable dynamical systems, we utilize a state observer neural network to estimate the states of the system's latent dynamics. We evaluate the performance of the proposed architecture and training methods on system identification tasks for three nonlinear systems, and find that models optimized with a few thousand system state observations accurately represent system dynamics in open loop simulation over thousands of time steps from a single set of initial conditions. Experimental results demonstrate an order of magnitude reduction in open-loop simulation mean squared error for our constrained, block-structured neural models when compared to traditional unstructured and unconstrained neural network models.

Nonlinear Control of Network Dynamical Systems, Megan Morrison (Applied Mathematics Department, UW)

Networks in nature regularly exhibit dynamics that are difficult to characterize and control due to their nonlinear nature, stochasticity, and use of obscure control signals. These systems are often marked by low-dimensional dynamics, multiple stable fixed points or attractors, and outputs that are generated from nonlocalized network activity. We develop a procedure for characterizing high-dimensional network systems with transparent, low-dimensional models and controlling them with nonlinear control signals. After using the SINDy algorithm to fit a low-dimensional model to our data, we use bifurcation theory to find collections of constant control signals that will produce the desired objective path and then project the control signals back to the original network space. We first illustrate our nonlinear control procedure on established bistable, low-dimensional biological systems, showing how control signals are found that generate switches between the fixed points. We demonstrate our control procedure for high-dimensional systems on random high-dimensional networks and Hopfield memory networks.

Robust Prediction in Spatio-Temporal Networks, Sai Pushpak Nandanoori (Optimization and Control Group, PNNL)

The power system is a spatio-temporal complex nonlinear dynamical system, and its reliable operation is critical to energy security in the modern world and is usually ensured via continuous monitoring and control of the critical infrastructure. Hence, able to predict/estimate the critical states as a result of a disturbance in the system becomes crucial for maintaining the transient stability of the network or to propose any mitigation strategies. This inspired our work to develop and compare novel predictive models that can be applied to analyze the transient stability of the power network when subjected to a disturbance. A data-driven and a physics informed predictive model are adopted in this work for power system state predictions. The data-driven-based predictive model treats the power network as a graph, explicitly depends on the topology, and builds the spatio-temporal correlations using graph convolutional network (STGCN) whereas the physics-informed one models the power network as a linear system in an abstract space based on the Koopman operator theoretic approaches (deepDMD). We introduced a spatio-temporal data generation framework, GridSTAGE that emulates real-time data from a power network and establishes a benchmark for predictive analysis. The data is generated by strategically varying the loads chosen based on the physical properties of the power system such as loads near a larger generator, smaller generator, etc, and based on the graphical properties such as being a central node, peripheral node, or close to a generator bus or far away from a generator bus, etc. In this work, these load change strategies are considered to generate the time-series data for an IEEE 68 bus system. Rigorous comparison during training and testing these predictive models reveal that the physics informed predictive model, deepDMD outperforms the STGCN in two different aspects. (a) DeepDMD model took a few minutes to train whereas the STGCN took a few hours to train on the same training data. (b) Although both deepDMD and STGCN produce similar accuracy in the predictions, STGCN depends on a history of observations as opposed to a single observation for deepDMD which serves as an initial condition to the linear system on the abstract space. Moreover, the dynamical system properties such as an equilibrium point, stability are captured in the linear system representation using deepDMD. Therefore, due to these advantages of the physics informed predictive model, the deepDMD, it forms an immediate choice to serve as a predictive model during the transient stability analysis of the power network.

Thursday, February 4 2021

Invited Talk: Multifactor Computational Screening of Materials for Renewable Energy Conversion and Storage. Tim Kowalczyk (Associate Professor, Chemistry Department, WWU)

Multifactor Computational Screening of Materials for Renewable Energy Conversion and Storage. Tim Kowalczyk (Chemistry Department WWU)

Nearly a decade after the Obama administration's launch of the Materials Genome Initiative, the adoption of computational and data-driven approaches to materials discovery has accelerated across the global materials research community. These approaches require cooperation among domain experts in physical sciences and data science; junior scientists working in this field are developing domain knowledge in both areas simultaneously and becoming the first native "materials informaticians". In this talk, I will share some recent efforts of our team of budding materials informaticians to integrate a data-driven approach with our physics-based materials simulations of renewable energy materials. The overarching goal of these efforts is to enable high-throughput virtual screening of candidate materials for two specific applications: (1) photoactive covalent organic frameworks (COFs) for increasing solar energy conversion efficiency via singlet fission; (2) photoisomerizable solar thermal fuels (STFs) for integrated solar energy conversion and storage. From a physical sciences perspective, this talk will apply electronic structure models to elucidate strategies for optimizing two classes of photoactive materials for solar energy conversion and storage applications. From a data science perspective, this talk will highlight the challenges of data curation in the photoactive materials domain space (even data produced by physics-based simulations!) and outline our strategy for addressing these challenges.

Tim Kowalczyk is an Associate Professor at Western Washington University with joint appointments in Chemistry, Materials Science, and Energy Studies. His research focuses on excited-state electronic structure of soft materials for energy conversion and storage applications. Prof. Kowalczyk is a Cottrell Scholar of the Research Corporation for Science Advancement. He is a recipient of the 2018 ACS Division for Computers in Chemistry Outstanding Junior Faculty Award and a CAREER award from the National Science Foundation.

Thursday, January 28 2021

Invited Talk: Human-Centric Decision Intelligence during COVID-19 Crisis: From Descriptive to Prescriptive Analytics. Svitlana Volkova (Chief Scientist, Computing & Analytics, PNNL)

Human-Centric Decision Intelligence during COVID-19 Crisis: From Descriptive to Prescriptive Analytics. Svitlana Volkova (Chief Scientist, Computing & Analytics, PNNL)

What can explain human behavior? How human behavior and perception shift during crisis events? Can we find causal relationships between human behavior online and outcomes in the real world? AI-driven decision intelligence can assist with answering these questions, and this talk will cover two recent studies coming from our group that analyze human behavior across cognitive and information domains, targeting to answer the “what” and the “why” questions.

Our first study focuses on analyzing social media discourse related to non-pharmaceutical interventions (NPIs)—implemented to mitigate COVID-19 spread—and explicitly measuring audience reactions, perspectives across psycho-demographics and locations over time – implemented in our interactive decision intelligence tool WatchOwl. To go beyond descriptive analytics, we rely on causal ensemble models and mixed-method approaches to discover natural experiments in observational data to perform treatment-effect estimation of NPIs and other variables on COVID-19 spread dynamics.

The second study focuses on detecting misleading and falsified COVID-19 narrative spread online and characterizing their impact in terms of speed, scale, lifetime and audience engagement. To go beyond descriptive analytics, we apply causal discovery and perform treatment-effect estimation on how message and user properties effect the ways misleading and falsified content spreads online. Our analysis and findings from two example studies clearly demonstrate how the departure from descriptive to prescriptive analytics allows to go beyond correlations and sensemaking in the human domain, and instead discover causal explanations of human behavior with recommendations to intervene. Prescriptive analytics allows moving away from a traditional reactive to a more proactive posture when assisting humans with decision making in the human social domain and beyond.

Svitlana Volkova is a Chief Scientist at the National Security Directorate, Pacific Northwest National Laboratory. She is known for building and leading cross-functional research teams of scientists and engineers to develop cutting-edge descriptive, predictive and prescriptive human-centered AI-driven analytics to explain, model, and intervene into human social systems and behaviors across cognitive, information and physical domains.

Svitlana is a recognized leader in the field of social media analytics and computational linguistics. Her scientific contributions cover a range of topics on social media analytics, natural language processing (NLP), applied machine learning (ML), and deep learning (DL). More specifically, her research focuses on developing machine learning models for predicting and forecasting real-world events and human behavior from social data. Approaches developed by Svitlana and her team advance understanding, analysis, and effective reasoning about extreme volumes of dynamic, multilingual, and diverse real-world social data.

Since joining PNNL in October 2015, Svitlana led more than ten projects and authored more than 50 peer-reviewed publications. She has been recognized with the NSD Author of the Year award and the 2019 Ronald L. Brodzinski Early Career Exceptional Achievement Award. Svitlana is a senior board member for Women in Machine Learning (WiML) with the mission to enhance the experience of women in ML. Prior to joining PNNL, Svitlana received her PhD in Computer Science in 2015 from Johns Hopkins University. Svitlana interned at Microsoft Research during her education. She was also awarded the Google Anita Borg Memorial Scholarship and the Fulbright Scholarship.

Thursday, November 19th 2020

Who Started It? An Analysis of Bot-Initiated Content in the White Helmets Disinformation Campaign. Kayla Duskin (PNNL)

Tackling False Information during the Coronavirus Pandemic. Jiexun Li (WWU faculty)

Can we expect the unexpected? Susceptibility of Neural Linguistic Deception Detection to Adversarial Attacks. Ellyn Ayton (PNNL)

Who Started It? An Analysis of Bot-Initiated Content in the White Helmets Disinformation Campaign. Kayla Duskin (PNNL)

While social media platforms have made access to current events and breaking news easier than ever, they are also vulnerable to online disinformation campaigns. One instance of a known disinformation campaign is against the Syria Civil Defense – also called the White Helmets – a volunteer organization operating in parts of opposition-controlled Syria specializing in medical evacuations. While the organization aims to provide emergency humanitarian aid, the social media campaign against the White Helmets works to spread distrust with accusations of terrorist activities and conspiracy theories. We use this campaign as a case study and analyze the behavior and impact of bots (automated or semi-automated users) in the online discourse surrounding the White Helmets. Using 1.1 million anonymized tweets collected between April 2018 and June 2019, we measure four properties of spread between bot and human users: audience size, number of shares, lifetime of tweets, and tweet diffusion speed. Through this case study, we begin to understand how misinformation propagates through online networks as compared to substantiated news and analyze differentiating behavioral patterns between bots and human users. Our objective of this ongoing work is to understand and assess the impact of claims made by different types of users in online social media discourse.

Tackling False Information during the Coronavirus Pandemic. Jiexun Li (WWU faculty)

In recent years, a lot of false information has emerged and spread over the Internet. During the recent coronavirus outbreak in late 2019 and early 2020, the amount of misleading information and fabricated content has grown rapidly, and they pose a serious risk to public health and safety. In this study, we attempt to understand how various characteristics of COVID-related claims affect the public interests in the claims. More specifically, this research aims to answer the following three research questions: (Q1) What impact does the ruling of the claims have on its public interest? (Q2) How does the source of the claim influence its public interest? (Q3) What are the textural cues that may help to detect fraudulent information? Several text-mining techniques are employed to address these questions. We collect and analyze claims data from a fact-checking website, PolitiFact. Keywords from these claims are then extracted and submitted to Google Trends to evaluate their public interests on the Internet. Given the early stage of this research, we are in the process of building prediction models to detect important attributes for false claims. We are convinced that more advanced feature engineering and machine learning models would provide important insights in understanding what claims are fraudulent and how public interests towards these claims change over time.

Can we expect the unexpected? Susceptibility of Neural Linguistic Deception Detection to Adversarial Attacks. Ellyn Ayton (PNNL)

There is a high reliance on social media platforms not only as social networking communities or entertainment but also as primary sources of news and general information – a Pew Research Center survey in 2019 found that 55% of U.S. adults get their news “often” or “sometimes” from these social media platforms. At the same time, there is great concern about the prevalence of deceptive content and the impact of such mis- or disinformation. Due to the scale and speed of new information appearing on social media platforms, a human-approach intractable – there is a near constant deluge of credible and deceptive content alike. In response, many deception detection models have been developed that leverage machine learning and deep learning to tackle the problem of identifying deceptive content, preferably before it has spread through the network. However, current literature lacks details for why users should trust these models and a realistic evaluation of their performance for unexpected or manipulated inputs. We present a comprehensive analysis of the performance of neural linguistic deception detection models across a variety of natural language adversarial attacks on social media posts from two popular platforms – Twitter and Reddit. Evaluating model robustness in general and under adversarial attacks is critical to developing trustworthy models through a deeper understanding of model behavior for expected, unexpected, and manipulated inputs.

Thursday, November 12th 2020

Scene Summarization via Motion Normalization. Scott Wehrwein (WWU faculty)

Convolutional Neural Network for Finding Unusual Frames in Outdoor Webcam Stream. Kanghui Liu (Wyze Labs)

Semantic Pixel Distances for Image Editing, Josh Myers-Dean (WWU student)

Scene Summarization via Motion Normalization. Scott Wehrwein (WWU faculty)

When observing the visual world, temporal phenomena are ubiquitous: people walk, cars drive, rivers flow, clouds drift, and shadows elongate. Some of these, like water splashing and cloud motion, occur over time intervals that are either too short or too long for humans to easily observe. High-speed and timelapse videos provide a popular and compelling way to visualize these phenomena, but many real-world scenes exhibit motions occurring at a variety of rates. Once a framerate is chosen, phenomena at other rates are at best invisible, and at worst create distracting artifacts. In this paper, we propose to automatically normalize the pixel-space speed of different motions in an input video to produce a seamless output with spatiotemporally varying framerate. To achieve this, we propose to analyze scenes at different timescales to isolate and analyze motions that occur at vastly different rates. Our method optionally allows a user to specify additional constraints according to artistic preferences. The motion normalized output provides a novel way to compactly visualize the changes occurring in a scene over a broad range of timescales. This is joint work with Kavita Bala and Noah Snavely.

Convolutional Neural Network for Finding Unusual Frames in Outdoor Webcam Stream. Kanghui Liu (Wyze Labs)

Outdoor live streaming webcams produce vast quantities of video footage that reveal rich visual information about scenes around the world and how they change. The ability to identify unique or unusual frames in these streams has potential applications in surveillance, photography, and traffic monitoring, but the volume of data makes manual analysis impractical. Existing techniques tend to focus on detecting unusual or suspicious events in a surveillance context, requiring high-level object motion analysis and rely on hand-crafted features. In this paper, we propose a deep learning-based model that uses Convolutional Neural Networks(CNNs) to solve the novel proxy task of predicting the time interval between frames, using the model’s inaccurate predictions as an indicator that something unexpected has happened in the scene. This is joint work with Scott Wehrwein.

Semantic Pixel Distances for Image Editing, Josh Myers-Dean (WWU student)

Many image editing techniques make processing decisions based on measures of similarity between pairs of pixels. Traditionally, pixel similarity is measured using a simple L2 distance on RGB or luminance values. In this work,we explore a richer notion of similarity based on feature embeddings learned by convolutional neural networks. We propose to measure pixel similarity by combining distance in a semantically-meaningful feature embedding with traditional color difference. Using semantic features from the penultimate layer of an off-the-shelf semantic segmentation model, we evaluate our distance measure in two image editing applications. A user study shows that incorporating semantic distances into content-aware resizing via seam carving produces improved results. Off-the-shelf semantic features are found to have mixed effectiveness in content-based range masking, suggesting that training better general-purpose pixel embeddings presents a promising future direction for creating semantically-meaningful feature spaces that can be used in a variety of applications. This is joint work with Scott Wehrwein.

Thursday, November 5th 2020

Towards Generalizable and Effective Causal Discovery Algorithm Selection and Ensembling Strategies. Robin Cosbey, PNNL

A Continuum Model of HPC environments. Rick Barnard (WWU Faculty)

Towards Generalizable and Effective Causal Discovery Algorithm Selection and Ensembling Strategies. Robin Cosbey, PNNL

Scientists from a variety of fields share a common goal of finding the underlying causal relations in observational data. This is an important task because it allows for the formation of interventional predictions about the data which can be tested and confirmed; however, this is a challenging undertaking because causal relationships are difficult to distinguish from other sources of correlation. In response to this, many algorithms have been developed for the task of causal discovery, in which observational data is analyzed in order to identify the underlying causal relationships. These algorithms leverage a variety of techniques and make differing assumptions about the observational data; it is currently unclear how well they generalize to data generated by different types of causal graphs. The aim of this work is to identify how different individual and ensembled algorithms will perform given data generated by graphs of varying size, density and structure. We employ several strategies in order to increase the generalizability of causal discovery algorithms. First, we identify how related inter-algorithm agreeability is with algorithm performance. We also explore how ensembling over all or over a subset of the algorithms can improve performance, and identify the effect of various voting schemes that take into account the agreeability information over the ensembling methods. We evaluate the performance of these strategies on two dissimilar datasets of simulated graphs and from this analysis we identify which strategies are the most generalizable and best performing over a variety of graphs.

A Continuum Model of HPC environments. Rick Barnard (WWU Faculty)

We develop a discrete model of data flow in an high performance computing (HPC) environment which results in a differential model. This can then be used to develop a continuum fluid-like model as the scale of the system increases to the extreme scale. The result of this is a partial differential model which is amenable to simulation and analysis of heterogeneous computing tasks. We then obtain results on the existence of solutions for this model as well as present some results of numerical experiments of this model.

Thursday, October 29th 2020

Kymatio: Scattering Transforms in Python. Muawiz Chaudhary, Western Washington University.

Time-varying Autoregression with Low Rank Tensors. Dr. Kameron Harris, Department of Computer Science, Western Washington University. (Slides)

Kymatio: Scattering Transforms in Python. Muawiz Chaudhary, Western Washington University

Some of deep learning's success can be attributed to an ability to learn representations that are invariant to affine transformations, such as translations, rotations, or changes in scale. However, deep learning algorithms traditionally do not aim to produce affine-invariance in the representation. Wavelet scattering transforms compute an affine-invariant representation of an input signal which is stable to deformations and preserves high frequency information. Mathematically principled reasoning guides the design of the scattering transforms, which can be viewed as a convolutional neural network where the filters are fixed, defined as wavelet transform operators. More specifically, the scattering transform is a cascade of convolutions with wavelet filters followed by nonlinear modulus and averaging operations.

Kymatio is a Python package which computes the scattering transform representation of a signal using GPU accelerated computing. Implementations of the 1D, 2D Gabor scattering transforms and 3D harmonic scattering transforms in a variety of deep learning frameworks are provided in this package, with more scattering transforms and frameworks to come. The implemented scattering transforms are fully differentiable and can be used in any portion of a deep learning pipeline. In this presentation we give an overview of scattering transforms and the Kymatio software package.

Time-varying Autoregression with Low Rank Tensors. Dr. Kameron Harris, Department of Computer Science, Western Washington University

We present a windowed technique to learn parsimonious time-varying autoregressive models from multivariate timeseries. This unsupervised method uncovers spatiotemporal structure in data via non-smooth and non-convex optimization. In each time window, we assume the data follow a linear model parameterized by a potentially different system matrix, and we model this stack of system matrices as a low rank tensor. Because of its structure, the model is scalable to high-dimensional data and can easily incorporate priors such as smoothness over time. We find the components of the tensor using alternating minimization and prove that any stationary point of this algorithm is a local minimum. In a test case, our method identifies the true rank of a switching linear system in the presence of noise. We illustrate our model's utility and superior scalability over extant methods when applied to several synthetic and real examples, including a nonlinear dynamical system, worm behavior, sea surface temperature, and monkey brain recordings.

Thursday, October 22nd 2020

Invited Talk: Using Big Data to Learn from Small Data, Nathan Hodas, Pacific Northwest National Laboratory, Data Science and Analytics (Slides)

Abstract: Big science has often been accompanied by big data, but scientists have often been stymied by the best way to leverage their data-rich observations. By combining advanced scientific computing with cutting edge deep learning, we have been able to broadly apply deep learning through-out our scientific mission. From high energy physics to computational chemistry to cyber-security, we are enhancing the pace and impact of diverse scientific disciplines by bringing together domain scientists and deep learning researchers across our laboratory. We are seeing in field after field, deep learning is driving transformational innovation, opening the door to a future of data-driven scientific discovery. However, many labels are extremely expensive to obtain or even impossible to obtain more than one, such as a specific astronomical event or scientific experiment. Combining domain knowledge with data driven methods allows us to drive down required data substantially. In fact, by combining vast amounts of labeled surrogate data with advanced few-shot learning, we have demonstrated success in leveraging only one-to-five examples to produce effective deep learning models. In this talk, we will discuss these exciting results and explore the scientific innovations that made this possible.

Thursday, October 15th 2020

Invited Talk: Environmental Geonomics: using big datasets to answer questions about tiny ecosystems in the face of climate change, Professor Robin Kodner, Huxley College of the Environment Western Washington University

Abstract: Tiny microbes and their complex communities can have large impacts on their environment. Microbes are critical for global carbon and nitrogen cycling, can cause harmful blooms in the environment or can cause large-scale disease. Microbial communities can range from very diverse and dynamic, to relatively simple and stable. My lab at Western Washington has been studying two local ecosystems that fall on opposite ends of this complexity spectrum: Bellingham Bay, a dynamic estuary, and the North Cascades snowy alpine environments. I use a range of techniques from microscopy to data-intensive environmental genomics and bioinformatics to observe the diversity and dynamics of microbial eukaryote communities. This talk will discuss ways in which biologists use large environmental genomic datasets to study these ecosystems in ways that weren’t available a decade ago. I will also share examples for what we have learned from these kinds of large datasets in our local microbial ecosystems from Bellingham Bay and the North Cascade mountains.

Thursday, October 8th 2020

Keynote Talk: Machine Learning for Science: Data-Driven Discovery Methods for Governing equations, Coordinates and Sensors, Professor Nathan Kutz, Department of Applied Mathematics, UW (Slides | Website)

Abstract: Machine learning and artificial intelligence algorithms are now being used to automate the discovery of governing physical equations and coordinate systems from measurement data alone. However, positing a universal physical law from data is challenging: (i) An appropriate coordinate system must also be advocated and (ii) simultaneously proposing an accompanying discrepancy model to account for the inevitable mismatch between theory and measurements must be considered. Using a combination of deep learning and sparse regression, specifically the sparse identification of nonlinear dynamics (SINDy) algorithm, we show how a robust mathematical infrastructure can be formulated for simultaneously learning physics models and their coordinate systems. This can be done with limited data and sensors. We demonstrate the methods on a diverse number of examples, showing how data can maximally be exploited for scientific and engineering applications. The work also highlights the fact that the naive application of ML/AI will generally be insufficient to extract universal physical laws without further modification.

Page updated

Google Sites

Report abuse