The Agenda

The Enigma Machine. Photo credit Clare Kendall.

The recordings of all the presentations are linked in the agenda below. You can also find them on the CSE DSI YouTube channel in the KGML2024 playlist.

The Schedule

7.30 am to 8.15 am

Light Breakfast

8.15 am to 8.25 am

Opening Remarks by Vipin Kumar
Video Recording

8.25 am to 9.10 am

Introductory Tutorial on KGML by Anuj Karpatne
Video Recording
Slides PDF

Bio: Anuj Karpatne is an Associate Professor of Computer Science at Virginia Tech. He develops machine learning (ML) methods to solve scientific and societally relevant problems by advancing the emerging field of scientific knowledge-guided machine learning (KGML).

9.10 am to 9.20 am

Knowledge-guided Foundation Models by Xiaowei Jia
Video Recording
Slides PDF

Bio: Xiaowei Jia is an Assistant Professor of Computer Science at the University of Pittsburgh. Dr. Jia's primary research interest is to advance machine learning and data science to solve real-world problems of great societal and scientific impacts. The bulk of his research has been focused on developing data mining and machine learning models that extract complex spatio-temporal data patterns while also leveraging accumulated scientific knowledge. A major highlight of his research is the physics-guided machine learning paradigm that is beginning to get attention in many scientific domains including hydrology, climate science, mechanical engineering, and agriculture. He is the recipient of the NSF CAREER Award and the NASA Early Career Investigator Award.

9.20 am to 9.40 am

Invited Talk by Jacob Zwart
Video Presentation
Slides PDF

Title: How much knowledge-guidance is needed? Insights from deep learning for water resources
Video Recording
Slides PDF

Abstract: Scientific knowledge can be integrated into machine learning models at various points, such as through the utilization of process-relevant architectures or custom loss functions designed to uphold specific physical or biological principles. However, how do we know when additional scientific guidance is beneficial versus when the model should discover patterns independently? For the past seven years, this question has been at the core of my research on knowledge-guided machine learning (KGML) as applied to water resources and remains a frontier in the transdisciplinary KGML research community. In this presentation, I'll showcase several applications of KGML models in addressing water resource challenges, each with different levels of scientific input incorporated during model development. I'll discuss the modeling decisions made at the U.S. Geological Survey and share techniques for model interrogation aimed at informing our choices regarding the appropriate balance between providing more guidance and allowing the models to discover patterns on their own.

Bio: Jacob Zwart works within the Data Science Branch of the Water Resources Mission Area to develop aquatic ecosystem modeling techniques that provide timely information to stakeholders about important water resources across the nation. He uses his expertise in computational modeling, data assimilation, and limnology to help produce short-term forecasts of water quality at regional scales to aid in water resources decision making. Jacob’s research themes are: 1) improve understanding of aquatic biogeochemical processes and predicting how these processes may respond to future global change, 2) develop techniques to inject scientific knowledge into machine learning models to make accurate predictions of environmental variables (also known as “knowledge-guided machine learning”), and 3) advance methods for assimilating real-time observations into knowledge-guided machine learning models to improve near-term forecasts of water quality. Jacob also serves as a Peer Support Worker at USGS promoting awareness and education on topics and USGS policies for antiharassment, discrimination, biases, and scientific integrity, as well as providing peer-to-peer support for USGS employees.

9.40 am to 10.00 am

Invited Talk by Zac McEachran and Rahul Ghosh
Video Recording
Slides PDF

Title: Knowledge-guided Machine Learning for Modeling Multi-scale Processes and Data Assimilation: Streamflow Forecasting in Hydrology

Abstract: We present a knowledge-guided machine learning (KGML) framework for modeling modes in multi-scale processes for streamflow forecasting in hydrology. Specifically, we propose a novel hierarchical recurrent neural architecture that factorizes the system dynamics at multiple levels of temporal granularity. Based on inverse modeling, this framework can empirically resolve the system's temporal modes from data (physical model simulations, observed data, or a combination of them from the past) and use them to improve the accuracy of the forecast. By incorporating multiple levels of temporal granularity and physical interpretation of the modes, the hierarchical model gains a comprehensive understanding through a nuanced representation of the system's dynamics. In a hydrological system, these modes represent different processes, evolving at different temporal scales (e.g., slow: groundwater recharge and baseflow vs. fast: surface runoff due to extreme rainfall). This approach enables the model to account for both rapid fluctuations and longer-term trends and respond effectively to changing conditions while providing explainability and interpretability in how they affect the model forecasts. Once trained, this framework makes it possible to assimilate observations without requiring loss function optimization (e.g., Kalman Filtering) to improve forecast accuracy. Experiments with several river catchments from the National Weather Service (NWS) North Central River Forecast Center region show the efficacy of this framework compared to standard baselines and archived forecaster-issued NWS forecasts. Specifically, when combined with simulation data from NWS, even a small number of historical observations lead to significantly better quality forecasts than existing methods. Our experiments show that the ability to model interacting temporal modes improves forecasting over using uncoupled temporal modes. Associating the empirical modes with physical processes is an important area of future research. Although we show our proposed framework's effectiveness in streamflow modeling in hydrology, multi-scale processes are essential in many engineering and scientific applications, making our framework generalizable to such use cases.

Bios: Dr. Zac McEachran is a hydrologist and catchment scientist at the NOAA National Weather Service. His research focuses on using advanced physics based and machine learning modeling tools to help understand the fundamental physical processes of how streamflow is generated at the catchment scale. He is particularly interested in creating feedbacks between developing better operational environmental forecasts and better understanding of catchment processes.

Rahul Ghosh is an Applied Scientist at Amazon AWS GenAI Innovation Center, where he focuses on advancing foundational machine learning methodologies and their applications. He earned his PhD in Computer Science from the University of Minnesota, where his research encompassed sequence modeling, computer vision, and multi-modal modeling. His professional experience includes developing foundational models and innovative solutions in both academic and industrial settings, with a particular emphasis on remote sensing and dynamical systems. Rahul's work has been instrumental in various projects, including high-resolution mapping using Remote Sensing imagery, knowledge guided machine learning for environmental systems and causal modeling and personalization for Amazon Music.

10.00 am to 10.20 am

Invited Talk by Charu Varadharajan
Video Recording
Slides PDF

Title: Improving predictability and reducing model complexity with knowledge-guided machine learning

Abstract: Predictions of river flows and other important variables in unmonitored regions has been a longstanding area of research in hydrological modeling. Large-scale deep learning models that incorporate data and physical characteristics from thousands of monitored locations have been demonstrated to outperform traditional numerical models for out-of-sample spatial predictions. However, the optimal selection of inputs for the deep learning models remains a challenge, particularly when there are hundreds of site characteristics that are potentially relevant. In this talk, I will describe our approaches to reduce the complexity of deep learning models and improve computational efficiency by reducing redundancies in model inputs for predictions of stream temperatures in unmonitored regions. I will also present our analysis of model errors to identify regions where the deep learning models will perform best. Our overall goal is to use scientific knowledge to optimize the trade-off between model complexity and accuracy, and to develop models that can perform well with the least amount of data.

Bio: Charuleka Varadharajan is a computational Earth scientist at Berkeley Lab, and has dedicated her career to water sustainability and resilience. Her research focuses on developing innovative data-centric solutions for water-energy applications. She has experience working on a broad range of topics including surface and groundwater quality, water resource resilience to human and natural disturbances, methane cycling, environmental impacts of carbon sequestration and fossil fuel production, and bioremediation. She also leads the Earth AI and data program at LBNL and her research group develops data science capabilities – from machine learning and statistical algorithms, to data management and integration services - for environmental applications. She was awarded a DOE Early Career Research Award in 2019 to build a data-driven framework to predict the impacts of floods and droughts on water quality in the United States. She is also a PI at the International Computer Science Institute and a Research Affiliate with the Berkeley Institute of Data Sciences.

10.20 am to 10.40 am

Invited Talk by Samantha Oliver
Video Recording
Slides PDF

Title: The role of knowledge-guided machine learning in the federal government: use cases from the U.S. Geological Survey

Abstract: The U.S. Geological Survey’s Water Mission Area provides water information that supports the protection of life and property, economic well-being, and the effective management of the Nation’s water resources. To achieve this mission, the USGS operates thousands of stream gages across the country, serving out real-time data on the quantity and quality of our Nations' waterways. Additionally, the USGS has decades of experience using that data to build collective understanding of hydrologic processes that is often represented in mechanistic and statistical models. Recently, scientists from the Survey have collaborated with computer scientists from universities to build knowledge-guided machine learning (KGML) models to solve water prediction problems. These approaches allow the USGS to leverage both our fundamental understanding of our water systems and the power and flexibility of machine learning approaches. Here, we’ll discuss use cases from this collaboration and how the USGS can inform, benefit from, and amplify the impact of KGML collaborations.

Bio: Samantha Oliver is a hydrologist at the U.S. Geological Survey and contributes to a variety of data-intensive projects at the Upper Midwest Water Science Center, including analysis of long-term trends in stream water quality, assessing the biological relevance of contaminants in Great Lakes tributaries, producing national extent stream temperature predictions, and forecasting water quality for various end users. Samantha uses data science approaches in her work and supports R programming training across the USGS. She also serves as the center’s Science Coordinator, where she supports internal and external science communication and connecting with USGS partners. Samantha received her PhD in Limnology and Freshwater Sciences from the University of Wisconsin-Madison and her Master’s degree in Integrated Biosciences from the University of Minnesota Duluth.

10.40 am to 10.55 am

Break

10.55 am to 11.05 am

Invited Talk by Runlong Yu
Video Recording
Slides PDF

Title: Adaptive Process-Guided Learning: An Application in Predicting Lake DO Concentrations

Abstract: This talk presents an innovative integration of physical and machine learning (ML) models to improve predictions of dissolved oxygen (DO) concentrations in lakes, focusing on our novel frameworks: Process-Guided Learning (Pril) and Adaptive Process-Guided Learning (April). Pril integrates DO mass conservation into its training objectives using a forward Euler scheme with daily timesteps. However, this method encounters numerical instabilities, particularly during stratified conditions where exogenous fluxes cause significant within-day changes in DO concentrations. To address these challenges, April enhances the model by dynamically adjusting timesteps from daily to sub-daily intervals, effectively addressing numerical instabilities and ensuring compliance with the law of mass conservation. We have tested our methods on a wide range of lakes in the Midwestern USA and demonstrated robust capability in predicting DO concentrations even with limited training data. This approach holds broad applicability across various scientific and engineering disciplines that utilize process-based models, including power engineering, climate science, and biomedicine.

Bio: Runlong Yu is a Postdoctoral Researcher in the Department of Computer Science at the University of Pittsburgh. He earned his Ph.D. in Computer Science and Technology from the University of Science and Technology of China (USTC) in Hefei, China, in 2023, after receiving his BEng from the same institution in 2017. His research interests span data mining, machine learning, evolutionary computation, and AI for science. Runlong is committed to devising efficient, simple, and explainable solutions for important scientific problems. His work notably includes the development of advanced data mining and machine learning models that adeptly handle complex spatio-temporal data patterns while integrating established scientific knowledge.

11.05 am to 11.15 am

Invited Talk by Kshitij Tayal
Video Recording
Slides PDF

Title: Large Language Models for Time Series Forecasting

Abstract: In this talk, we will explore the application of large language models (LLMs) to time series forecasting. We discuss how LLMs, traditionally used for natural language tasks, can be adapted to predict future values in sequential numerical data, including recent innovations like PatchTST and inverted transformers. The talk will introduce ExoTST, a novel approach that effectively incorporates exogenous variables for improved forecasting in scientific applications. By examining various strategies for developing time series foundation models, the talk will highlight the potential of LLM-inspired approaches to advance the time series analysis and forecasting field.

Bio: Kshitij Tayal's research interests lie in advancing machine learning and data science to tackle real-world problems with significant societal and scientific impacts. He has published his research at major computer science conferences, including COLING, KDD, SDM, ICDM, IUI, and was awarded the best paper award for his work at the 2022 SIAM International Conference on Data Mining. He is working to build a novel reprogramming framework that can repurpose LLMs for scientific use cases by aligning natural language modalities to scientific data.

11.15 am to 12.00 pm

Panel 1: Navigating the KGML Landscape: Challenges and Opportunities
Video Recording

Moderator: Paul Hanson

Panelists: Yan Liu, Jennifer Dy, Chris Duffy, Wei Wang

Abstract: KGML is a rapidly growing field with a wide range of concepts and methodologies for integrating knowledge in ML that are being explored by a number of research communities in a variety of applications. Given the breadth and diversity of research topics in KGML, it is important to provide a structure to past and on-going research efforts in the field and expose commonalities in research being conducted in different disciplines to cross-pollinate new ideas. The goal of the panel is to guide the research audience on how to navigate the KGML landscape and hear from experts on how to identify the right framing of a scientific problem to apply KGML methodologies. We also intend to discuss open challenges and opportunities in the field to take the community forward.

12.00 pm to 1.00 pm

Lunch

1.00 pm to 1.20 pm

Invited Talk by Zhenong Jin
Video Recording
Slides PDF

Title: Knowledge-guided machine learning for the next generation of agroecosystem prediction

Abstract: Accurate and rapid quantification of carbon, nutrient and water cycles throughout the

agroecosystem is critical to ensure the co-sustainability of food production and environmental protection. Cropping system models are widely used to simulate these processes, although with well-known limitations such as insufficient representations of the physical and biogeochemical processes, and uncertainties in many model parameters. These limitations can be serious when applying such models across the heterogeneous landscape but with very limited observations. Knowledge-guided machine learning (KGML) is a novel hybrid modeling framework that involves deeply embedding process-guided models inside machine learning models to significantly lower the data demand for learning and improve out-of-sample prediction accuracy and achieved a great success in the modeling of thermal and hydrological processes that have relatively well-known physics. However, developing an effective KGML model for many biogeochemical processes (e.g., GHG emissions and nutrient cycling) in the agroecosystem is extremely challenging, mainly because they are highly variable over space and time and often characterized with hot-moment, hot-spots patterns that are inherently hard to be modeled using neural networks. In this talk, we present our latest progress in KGML modeling for a range of processes that are critical to the agroecosystem prediction, such as modeling the crop photosynthesis, carbon allocation, crop phenology, soil decomposition, carbon dioxide (CO2), nitrous oxide (N2O) and methane (CH4) emissions, as well as novel approaches to assimilate remote and in-situ sensing data and rapid model calibration. Overall, our findings demonstrate the high potential of KGML application in complex agroecosystem modeling and provide food-for-thought in developing the next generation of AI-empowered agroecosystem prediction framework.

Bio: Zhenong Jin is an Associate Professor of digital agriculture at University of Minnesota, who has a broad focus on agricultural remote sensing, computational modeling and machine learning. His research is well-funded by the NSF CAREER award, along with many other NSF, NASA, DOE and USDA grants. He has been extensively published, including top journals like Science, Nature Climate Change, and Nature Reviews Earth & Environment, Nature Food, and Nature Communications. Since 2023, Jin has been co-leading the development of solutions for the measurement, monitoring, reporting, and verification (MMRV) of soil carbon change and greenhouse gas emissions for the National AI Institute for agriculture and forestry (AI-CLIMATE).

1.20 pm to 1.40 pm

Invited Talk by Yiqun Xie
Video Recording
Slides PDF

Title: Improving Theory-based Simulation Models with Knowledge-Guided Learning

Abstract: Theory-based simulation models have been continuously developed for Earth Science problems, serving as the building blocks for climate projection, carbon monitoring, and many more. These mechanistic models possess various favorable properties such as interpretability and stability. However, they are also largely constrained by the high computational cost, environment-dependent assumptions, etc. This talk presents new results on knowledge-guided machine learning (KGML) to bridge the gaps. First, we demonstrate the significant computational advantage offered by KGML for simulation models. In particular, we present the first ML emulator for high-resolution carbon forecasting in forest ecosystems. The KGML model, Deep-ED, provides high-fidelity approximation of the Ecosystem Demography (ED) model, which is a key component for land carbon in the NASA Carbon Monitoring System and Global Carbon Budget. By addressing the long-term error accumulation (e.g., over 40 years) and heterogeneous behaviors among carbon variables, Deep-ED significantly outperformed baseline deep learning models in Northeastern US and reduced the computational time by orders of magnitudes compared to ED. Second, for problems with multiple simulation models developed under distinct assumptions, we present a meta-learning framework to optimize the selection of simulation models in the KGML context. Using the baseflow prediction problem as an example, the proposed meta-learner demonstrated significant improvements in stability, especially in scenarios with anomalous patterns and sparse observations.

Bio: Yiqun Xie is an Assistant Professor in Geospatial Information Science at the University of Maryland. He received his PhD in Computer Science at the University of Minnesota, and his research addresses challenges facing machine learning for spatio-temporal data and related scientific problems. His current work focuses on: (1) variability-aware learning in space and time, (2) knowledge-guided learning for data-sparse applications, and (3) fairness-aware learning to reduce mapping bias. His research is supported by NSF, NASA, and Google, and has received recognitions including the Best Paper Award from IEEE ICDM 2021, the Best Application Paper Award from SIAM Data Mining 2023, the Best Vision Paper Award from ACM SIGSPATIAL 2019, and highlights from the Great Innovative Ideas by CCC at CRA.

1.40 pm to 2.00 pm

Invited Talk by Gengchen Mai
Video Recording
Slides PDF

Title: Towards a Foundation Model for GeoAI

Abstract: Large pre-trained models, also known as foundation models (FMs), are trained in a task- agnostic manner on large-scale data and can be adapted to a wide range of downstream tasks by fine-tuning, few-shot, or even zero-shot learning. Despite their successes in language and vision tasks, we have yet seen an attempt to develop foundation models for geospatial artificial intelligence (GeoAI). In this work, we explore the promises and challenges of developing multimodal foundation models for GeoAI. We investigate the potential of many existing FMs by testing their performances on seven tasks across multiple geospatial subdomains including Geospatial Semantics, Health Geography, Urban Geography, and Remote Sensing. Our results indicate that on several geospatial tasks that only involve text modality, these task-agnostic LLMs can outperform task-specific fully-supervised models in a zero-shot or few-shot learning setting. However, on other geospatial tasks, especially tasks that involve multiple data modalities, existing foundation models still underperform task-specific models. Based on these observations, we proposed various frameworks that leveraging geographic knowledge to improve the performance of foundation models on geospatial tasks including location description recognition, sustainability index prediction, image geolocalization, etc.

Bio: Dr. Gengchen Mai is currently a Tenure-Track Assistant Professor at the Department of Geography and the Environment, University of Texas at Austin. He got his Ph.D. in Geographic Information Science from Department of Geography, University of California, Santa Barbara. Before becoming a faculty, he was a Postdoctoral scholar at Stanford Artificial Intelligence Laboratory, Department of Computer Science, Stanford University. Before joining UT, he was an Assistant Professor at University of Georgia. Dr. Mai's research is Spatially Explicit Artificial Intelligence, Geo-Foundation Models, Geographic Knowledge Graphs, etc. Dr. Mai is the receipt of many prestigious awards including AAG 2021 Dissertation Research Grants, AAG 2022 William L. Garrison Award for Best Dissertation in Computational Geography, AAG 2023 J. Warren Nystrom Dissertation Award, Top 10 WGDC 2022 Global Young Scientist Award, the Jack and Laura Dangermond Graduate Fellowship. According to the historical records of AAG award recipients, he is now the sole recipient in history to have received three AAG doctoral dissertation awards since 2000.

2.00 pm to 2.10 pm

Invited Talk by Praveen Ravirathinam
Video Recording
Slides PDF

Title: Towards a Knowledge guided Multimodal Foundation Model for Spatio-Temporal Remote Sensing Applications

Abstract: In recent years, there is increased interest in foundation models for geoscience due to vast amount of earth observing satellite imagery. Existing remote sensing foundation models make use of the various sources of spectral imagery to create large models pretrained on masked reconstruction task. The embeddings from these foundation models are then used for various downstream remote sensing applications. In this paper we propose a foundational modeling framework for remote sensing geoscience applications, that goes beyond these traditional single modality masked autoencoder family of foundation models. This framework leverages the knowledge guided principles that the spectral imagery captures the impact of the physical drivers on the environmental system, and that the relationship between them is governed by the characteristics of the system. Specifically, our method, called MultiModal Variable Step Forecasting (MM-VSF), uses mutlimodal data (spectral imagery and weather) as its input and a variable step forecasting task as its pretraining objective. In our evaluation we show forecasting of satellite imagery using weather can be used as an effective pretraining task for foundation models. We further show the effectiveness of the embeddings from MM-VSF on the downstream task of pixel wise crop mapping, when compared with a model trained in the traditional setting of single modality input and masked reconstruction based pretraining.

Bio: Praveen Ravirathinam is a PhD candidate in Computer Science at the University of Minnesota, Twin Cities advised by Prof. Vipin Kumar. Praveen's research focuses on applications of deep learning in remote sensing, mainly focusing on land cover land use mapping, crop type mapping and lake classification. More recently, his focus has been on incorporating knowledge guided principles and parameters into land cover mapping related tasks, and has been using these ideas to create a foundation model for geoscience.

2.10 pm to 2.20 pm

Invited Talk by Arvind Renganathan
Video Recording
Slides PDF

Title: Knowledge Guided Machine Learning for Task Aware Modeling: An Approach for Few Shot Learning in Heterogeneous Systems (KGML-TAM)

Abstract: In this talk, we present KGML-TAM, a novel approach for few-shot learning in heterogeneous environmental systems that enables effective extrapolation in both space and time. Our method combines inverse and forward modeling techniques with task-aware modulation and representation learning to efficiently transfer knowledge from well-observed entities to those with sparse observations. KGML-TAM utilizes a modulation network as an inverse model to generate task-specific parameters, which are then used to adapt a base network serving as a forward model for personalized predictions. This approach allows for more efficient leveraging of information across multiple tasks and transfer of knowledge to new, unseen locations and/or time periods. We demonstrate KGML-TAM's effectiveness on two key environmental applications: predicting Gross Primary Product (GPP) for flux towers and streamflow for river basins, showcasing its ability to extrapolate in both space (across different locations) and time (to future periods). Our results show significant improvements over existing meta-learning methods like MAML and MMAML, with KGML-TAM outperforming baselines in both accuracy and computational efficiency. This research contributes to bridging data gaps in critical environmental monitoring and prediction tasks, particularly in regions with limited observations, and opens up possibilities for efficient learning and extrapolation in data-limited scenarios across various domains.

Bio: Arvind Renganathan's research lies at the intersection of machine learning and complex environmental modeling, focusing on developing knowledge-guided machine learning frameworks. His work focuses on addressing the challenges of heterogeneous systems and sparse data in environmental science, advancing techniques such as meta-learning and foundation models. These approaches improve predictions in data-limited scenarios, enabling better extrapolation in both space and time for critical environmental monitoring tasks. Arvind's research tackles real-world problems with significant societal and scientific impacts. His work has been published in major computer science conferences, including ICDM, SDM, and KDD.

2.20 pm to 2.40 pm

Break

2.40 pm to 3.00 pm

Invited Talk by Josef Uyeda
Video Recording
Slides PDF

Title: Accelerating discovery in biodiversity science and systematics with knowledge-guided machine learning

Abstract: In biodiversity science, a major bottleneck to discovery is that the basic units of data used—phenotypic traits—remain largely defined, measured, and categorized by experts. This process of "character construction" is a vitally important part of the process but has remained highly resistant to automation. Nevertheless, automated trait construction has the potential to greatly accelerate biodiversity science. Recent advantages in artificial intelligence provide potential paths forward for automating or accelerating this process by incorporating expert knowledge directly into the models. However, using these approaches requires careful assessment and validation, which is difficult to do given the "black-box" nature of traditional neural networks. If we do not know how neural networks are making decisions, how can we establish meaning for any characters they construct, or even if this goal is possible at all? This talk will explore methods for integrating structured expert knowledge, particularly phylogenies, into models for character construction and trait discovery from images, and discuss approaches to validation can be used to assess when these models do or do not construct characters in a biologically meaningful way.

Bio: Josef Uyeda is an evolutionary biologist and Associate Professor of Biological Sciences at Virginia Tech. His work focuses on the study of trait evolution across the tree of life, seeking ways of understanding the causes and consequences of trait change using the sparse comparative data available from species traits and their evolutionary relationships by integrating computable knowledge of biological processes, definitions, and data and integrating these into our models.

3.00 pm to 3.20 pm

Invited Talk by Krishna Garikapati
Video Recording
Slides PDF

Title: Fokker-Planck-Inverse Reinforcement Learning: A physics-constrained approach to Markov Decision Process models of cell dynamics

Abstract: Inverse Reinforcement Learning (IRL) is a compelling technique for revealing the rationale underlying the behavior of autonomous agents. IRL seeks to estimate the unknown reward function of a Markov decision process (MDP) from observed agent trajectories. However, IRL needs a transition function, and most algorithms assume it is known or can be estimated in advance from data. It therefore becomes even more challenging when such transition dynamics is not known a-priori, since it enters the estimation of the policy in addition to determining the system's evolution. When the dynamics of these agents in the state-action space is described by stochastic differential equations (SDE) in It\^{o} calculus, these transitions can be inferred from the mean-field theory described by the Fokker-Planck (FP) equation. We conjecture there exists an isomorphism between the time-discrete FP and MDP that extends beyond the minimization of free energy (in FP) and maximization of the reward (in MDP). We identify specific manifestations of this isomorphism and use them to create a novel physics-aware IRL algorithm, FP-IRL, which can simultaneously infer the transition and reward functions using only observed trajectories. We employ variational system identification to infer the potential function in FP, which consequently allows the evaluation of reward, transition, and policy by leveraging the conjecture. We demonstrate the effectiveness of FP-IRL by applying it to a synthetic benchmark and a biological problem of cancer cell dynamics, where the transition function is inaccessible.

This is joint work with Changyang Huang, Siddhartha Srivastava, Kenneth Ho, Kathryn Luker, Gary Luker and Xun Huan.

Bio: Krishna Garikipati obtained his PhD at Stanford University in 1996, and after a few years of post-doctoral work, he joined the University of Michigan in 2000, rising to Professor in the Departments of Mechanical Engineering and Mathematics. Between 2016 and 2022, he served as the Director of the Michigan Institute for Computational Discovery & Engineering (MICDE). In January 2024 he moved the Department of Aerospace and Mechanical Engineering at University of Southern California. His research is in computational science, with applications drawn from biophysics, materials physics, mechanics and mathematical biology. Of recent interest are data-driven approaches to computational science. He has been awarded the DOE Early Career Award for Scientists and Engineers, the Presidential Early Career Award for Scientists and Engineers (PECASE), and a Humboldt Research Fellowship. He is a fellow of the US Association for Computational Mechanics, and the International Association for Computational Mechanics, a Life Member of Clare Hall at University of Cambridge, and a visiting scholar in Computational Biology at the Flatiron Institute of the Simons Foundation.

3.20 pm to 4.00 pm

Panel 2: KGML in the Age of Generative AI
Video Recording

Moderator: Ananth Grama

Panelists: Tanya Berger-Wolf, Wei Ding, Shuiwang Ji, Gengchen Mai

Abstract: Generative AI models, including large language models (LLMs) and vision-language models, have emerged as indispensable tools capable of effectively integrating and addressing a wide range of tasks within a single framework. The prevalence of these large-scale models has offered tremendous opportunities for the development of new KGML models for addressing diverse and complex scientific tasks. Although initial works in this direction have begun to demonstrate promise in scientific problems, it remains challenging how to synergistically combine generative AI techniques with scientific knowledge to construct reliable generative AI models for scientific applications. This panel will focus on exploring new opportunities and challenges of leveraging generative AI techniques to advance KGML.

4.00 pm

Concluding Remarks

4.00 pm to 6.00 pm

Reception

Page updated

Report abuse