Introduction and Overview Session Details

Tuesday, August 18, 9:30-12:30 CDT

YOUTUBE LINKS: Part 1 (unedited, start to break), Part 2 (unedited, break to session end)

Session Chairs: Vipin Kumar and Michael Steinbach

SPEAKERS:

10:55-11:10 BREAK

"Knowledge Guided Machine Learning: Challenges and Opportunities"

Abstract: This research intends to develop a framework that uses the unique capability of data science models to automatically learn patterns and models from data, without ignoring the treasure of accumulated scientific knowledge. The proposed effort builds the foundations of knowledge-guided machine learning by exploring several ways of bringing scientific knowledge and machine learning models together using pilot applications from four domains: aquatic ecodynamics, climate and weather, hydrology, and translational biology. These pilot applications were selected because they are at tipping points where knowledge-guided machine learning can have a transformative effect.

A major goal is to formally conceptualize the paradigm of “knowledge-guided machine learning (KGML)”, where scientific theories are systematically integrated with machine learning models in the process of knowledge discovery. This paradigm will be broadly applicable for improving the modeling of physical and biological systems where mechanistic (also known as process-based) models are used, and thus, KGML has the potential for accelerating discovery in a range of scientific and engineering disciplines.

Bio: Vipin Kumar is a Regents Professor at the University of Minnesota, where he holds the William Norris Endowed Chair in the Department of Computer Science and Engineering. He has authored over 300 research articles, and has coedited or coauthored 10 books including two text books "Introduction to Parallel Computing" and "Introduction to Data Mining", that are used world-wide and have been translated into many languages. Kumar's current major research focus is on bringing the power of big data and machine learning to understand the impact of human induced changes on the Earth and its environment.


"Science-guided Machine Learning: Advances in An Emerging Paradigm Combining Scientific Knowledge with Machine Learning"

Abstract: This talk will introduce science-guided machine learning, an emerging paradigm of research that aims to principally integrate the knowledge of scientific processes in machine learning frameworks to produce generalizable and physically consistent solutions even with limited training data. This talk will describe several ways in which scientific knowledge can be combined with machine learning methods using case studies of on-going research in various disciplines including hydrology, fluid dynamics, quantum science, and biology. These case studies will illustrate multiple research themes in science-guided machine learning, ranging from physics-guided design and learning of neural networks to construction of hybrid-physics-data models. The talk will conclude with a discussion of future prospects in the emerging field of science-guided machine learning that has the potential to impact several disciplines in science and engineering that have a rich wealth of scientific knowledge and some availability of data.

Bio: Anuj Karpatne is an Assistant Professor in the Department of Computer Science at Virginia Tech, where he develops data mining and machine learning methods to solve scientific and socially relevant problems. A key focus of Dr. Karpatne’s research is to advance the field of science-guided machine learning for applications in several domains ranging from climate science, hydrology, and ecology to cell cycle biology, mechano-biology, quantum science, and fluid dynamics. Dr. Karpatne co-organized the FEED 2018 workshop, served as the workshop co-chair for SIGKDD 2019, and has co-organized sessions at AAAS Annual Meeting 2019 and AGU Fall Meetings 2017 and 2018. He is currently serving as the co-Editor-in-Chief for the SIGAI “AI Matters” and the Review Editorial Board Member for “Data-driven Climate Sciences” section in Frontiers in Big Data journal. In recognition of his interdisciplinary research efforts in geosciences, Dr. Karpatne was named the Inaugural Research Fellow by the IS-GEO (Intelligent Systems for Geosciences) Research Coordination Network in 2018. Dr. Karpatne is also a co-author of the second edition of the textbook, Introduction to Data Mining. He received his Ph.D. in Computer Science at the University of Minnesota in 2017 under the guidance of Prof. Vipin Kumar.

"Deep learning for a better understanding of the Earth System?"

Absstract: The Earth is a complex dynamic networked system. Machine learning, i.e. derivation of computational models from data, has already made important contributions to predict and understand components of the Earth system, specifically in climate, remote sensing and environmental sciences. For instance, classifications of land cover types, prediction of land-atmosphere and ocean-atmosphere exchange, or detection of extreme events have greatly benefited from these approaches. Such data-driven information has already changed how Earth system models are evaluated and further developed. However, many studies have not yet sufficiently addressed and exploited dynamic aspects of systems, such as memory effects for prediction and effects of spatial context, e.g. for classification and change detection. In particular new developments in deep learning offer great potential to overcome these limitations.

Yet, a key challenge and opportunity is to integrate (physical-biological) system modeling approaches with machine learning into hybrid modeling approaches, which combines physical consistency and machine learning versatility. A couple of examples are given with focus on the terrestrial biosphere, where the combination of system-based and machine-learning-based modelling helps our understanding of aspects of the Earth system.

Reichstein, M., Camps-Valls, G., Stevens, B., Jung, M., Denzler, J., Carvalhais, N., Prabhat, 2019. Deep learning and process understanding for data-driven Earth system science. Nature 566, 195-204.

Bio: Markus Reichstein is Director of the Biogeochemical Integration Department at the Max-Planck Institute for Biogeochemistry, Jena, Professor for Global Geoecology at the FSU Jena, and Director at the Michael-Stifel-Center Jena for Data-driven and Simulation Science in Jena. His main research interests include the effect of climate variability/extreme and change on global ecosystems, in particular carbon and water cycles. He and his research group tackle these topics by combining experimental, ground- and satellite-based observations with machine-learning based data-driven and process-oriented system models in a model-data integration approach. Prof. Reichstein has authored >185 publications and has been part of the iLEAPS/IGBP Scientific Steering Committee, and lead author of the IPCC special report on Climate Extremes (SREX). Currently he is serving as member of the Thuringian panel on climate change, and chairing the Future Earth/IRDR/WCRP Knowledge Action Network “Extreme events and emergent Risks”. He was awarded with the Max-Planck Research Award by the Alexander von Humboldt foundation (2013), the Piers J Sellers mid-career award on Global Environmental Change by the American Geophysical Union (2018), and recently the Gottfried Wilhelm Leibniz Prize by the German Science Foundation (DFG).


"Physics-guided Machine Learning for Sub-seasonal Climate Forecasting"

Abstract: Sub-seasonal climate forecasting (SSF) focuses on predicting key climate variables such as temperature and precipitation in the 2-week to 2-month time scale. Skillful SSF would have immense societal value, in areas such as agricultural productivity, water resource management, transportation and aviation systems, and emergency planning for extreme weather events. SSF is considered more challenging than weather prediction and limited progress has been made on SSF. In fact, SSF is challenging for both purely physics based as well as machine learning (ML) approaches.

In this talk, we will discuss how we are approaching SSF from the ML perspective, the physics/dynamical systems perspective, and how we plan to combine the two to develop a physics-guided ML approach for SSF. We will discuss specific advances we have made with ML approaches for SSF, and how the results highlight the importance of identifying and using variables and processes with memory. We will also discuss how such advances are driving promising new directions for physics/dynamical system based approaches for SSF which in turn will help improve the ML approaches.

Bio: I am a professor in the Department of Computer Science and Engineering at the University of Minnesota. My research interests are in Machine Learning, Data Mining, Information Theory, Convex Analysis and Optimization, and their applications in complex real world learning problems including problems in Text and Web Mining, Climate Sciences, Ecology, Finance, Social Networks, and Bioinformatics.

"Blending machine learning and physics for climate modeling"

Abstract: Numerical simulations used for weather and climate predictions solve approximations of the governing laws of fluid motions. The computational cost of these simulations limits the accuracy of the predictions. Uncertainties in the simulations and predictions ultimately originate from the poor or lacking representation of processes, such as turbulence, that are not resolved on the numerical grid of global climate models. I will show that using machine learning (ML) algorithms with imposed physical constraints are good candidates to improve the representation of processes that occur below the scales resolved by global models. In this talk, I will propose new representations of ocean turbulence based on two different ML approaches using data from high-resolution simulations. Specifically, I will discuss how to use relevance vector machines to discover equation for the sub grid forcing, and convolutional neural networks to derive a stochastic representation of sub grid forcing. The new models of turbulent processes are interpretable and/or encapsulate physics, and lead to improved simulations of the ocean. Our results simultaneously open the door to the discovery of new physics from data and the improvement of numerical simulations of oceanic and atmospheric flows.

Bio: I am a Professor in Mathematics & Atmosphere/ Ocean Science at the Courant Institute at New York University. My research focuses on the dynamics of the climate system. The main emphasis of my work is to study the influence of the ocean on local and global scales, through the analysis of observations and a hierarchy of numerical simulations. Recently, I have worked on a wide range of topics including ocean redistribution of heat and carbon under climate change, regional sea level rise, air-sea coupling and predictability in mid-latitudes, ocean turbulence in climate models, and uncertainty quantification.

"Deep Learning and Gaussian Processes: Some connections"

Abstract: Deep artificial neural networks have been successfully used in a number of applications, and over time, several deep learning algorithms have been developed. However, it remains a challenge to understand why and where deep artificial neural networks perform well, and to quantify the uncertainty in the inferences from such networks. An understanding of the properties of deep artificial neural networks is critical for its successful use in the process of scientific discovery and hypothesis generation. Gaussian processes have a long history, and its properties relating inference and uncertainty quantification are well understood. We will present some connections between deep artificial neural networks and Gaussian processes, which may be useful in understanding the foundations of deep learning, and in developing science aware learning algorithms.

Bio: I am currently a Professor in the Department of Statistics at the University of Minnesota, and the Director of the Institute for Research in Statistics and its Applications (IRSA). The mission of IRSA is to foster all aspects of collaborative and inter-disciplinary research involving Statistics and Data Sciences. We promote research on Data Science theory, methods, algorithms, software development, and its usage in inter-disciplinary research in all fields of study. I am a fellow of the Institute on the Environment and member of the Minnesota Population Center here at the University of Minnesota. My research interests include statistical foundations of data science and machine learning, high dimensional data geometry, Bayesian statistics, resampling methods, and applications of statistics, artificial intelligence and machine learning in multiple domains.