AAAI 2024 Bridge on  

Knowledge-guided ML

Bridging Scientific Knowledge and AI

(KGML-Bridge-AAAI-24)


Held as part of the Bridge Program at AAAI 2024

February 20 to 21, 2024

Room 205, Vancouver Convention Centre – West Building | Vancouver, BC, Canada

Overview

Scientific knowledge-guided machine learning (KGML) is an emerging field of research where scientific knowledge is deeply integrated in ML frameworks to produce solutions that are scientifically grounded, explainable, and likely to generalize on out-of-distribution samples even with limited training data. By using both scientific knowledge and data as complementary sources of introduction in the design, training, and evaluation of ML models, KGML seeks a distinct departure from black-box data-only methods and holds great potential for accelerating scientific discovery in a number of disciplines. 

The goal of this bridge is to nurture the community of researchers working at the intersection of ML and scientific areas and shape the vision of the rapidly growing field of KGML. This bridge builds upon the success of three previous symposiums organized on this topic at the AAAI Fall Symposium Series in 2020, 2021, and 2022. See our book on KGML for a coverage of research topics in this field.


New: The slides from our Introductory Tutorial on KGML are now available. Here is the link to download slides


Schedule

Day 1: Feb 20

9:00 am to 9.15 am

Opening Remarks

9:15 am to 10.30 am

Introductory Tutorial on KGML:  Part 1 (Overview)
Speaker: Anuj Karpatne

10 am to 10.30 am

Coffee Break

10:30 am to 12.30 pm

Introductory Tutorial on KGML:  Part 2 (Case Studies)
Speakers: Nikhil Muralidhar, Ramkrishnan Kannan, Anuj Karpatne

12:30 pm to 2:00 pm

Lunch Break

2:00 pm to 2.30 pm

Invited Talk by Noah Benson
Title: Automated segmentation of the human visual cortex by convolutional neural networks

Abstract: Segmenting brain areas from functional MRI data is an important but difficult task for many neuroscience studies of human vision because the traditional methods of segmentation require extensive time in the scanner conducting experiments, extensive computation time processing the experimental results, and extensive human time interpreting these results and delineating the regions of interest. Automated methods based on the brain's gray-matter anatomy or a combination of anatomy and data from functional MRI experiments can reduce these requirements but are less accurate than experts. Convolutional Neural Networks (CNNs) are powerful tools for automated medical image segmentation. We hypothesize that CNNs can delineate visual area boundaries with high accuracy. We trained U-Net CNNs with ResNet backbones to segment the first three cortical visual areas (V1, V2, and V3) using a dataset of human-labeled maps. Separate CNNs were trained to predict these regions using different combinations of the following input data: (1) anatomical data regarding the brain's gray-matter only, (2) anatomical data about the brain's gray-matter combined with anatomical data about the brain's white-matter from diffusion-weighted imaging, and (3) anatomical data combined with functional data from visual experiments. All CNNs using functional data had cross-validated accuracies that were statistically indistinguishable from the inter-rater reliability of the training dataset (dice coefficient of 92%) while the CNNs lacking functional data had lower but similar accuracies (~75%). Existing methods of segmenting the visual cortex that do not use CNNs had accuracies substantially lower than those of any of the CNNs. These results demonstrate that with current methods and data quality, CNNs can segment brain areas approximately as well as humans. However, segmentations made using the brain's anatomical structure alone are substantially worse than those informed by functional measurements, suggesting that brain structure and brain function are partially independent.

Bio: Dr. Noah C. Benson is a senior data scientist at the University of Washington's eScience Institute where he performs research on the relationship between brain structure and brain function in the human visual cortex. Dr. Benson obtained his Ph.D. from the University of Washington in 2010 before completing a post-doctoral position with Profs. David Brainard and Geoff Aguirre at the University of Pennsylvania (2010–14) then working as a research scientist with Prof. Jonathan Winawer at New York University (2014–20). During this time, Dr. Benson has published numerous papers focused on methods for predicting brain function from brain anatomy and on the relationship between brain anatomy and human vision. His current research agenda focuses on the application of contemporary artificial intelligence tools to these topics. Dr. Benson is also the author of the software library neuropythy, a neuroscience MRI toolkit, and is a co-organizer of the NeuroHackademy, an annual 2-week workshop at the University of Washington designed to bridge data science and neuroscience education.

2:30 pm to 3.00 pm

Invited Talk by Yexiang Xue
Title: Vertical Reasoning Enhanced Learning, Generation and Scientific Discovery

Abstract: Automated reasoning and machine learning are two fundamental pillars of artificial intelligence. Despite much recent progress, building autonomous agents fully integrating reasoning and learning is still beyond reach. This talk presents two cases where integrated vertical reasoning significantly enhances learning. Our first application is in neural generation, where state-of-the-art models struggle to generate pleasing images while satisfying complex specifications. We introduce Spatial Reasoning INtegrated Generator (SPRING). SPRING embeds a spatial reasoning module inside the deep generative network which decides the locations of objects to be generated. Embedding symbolic reasoning into neural generation guarantees constraint satisfaction, offers interpretability, and facilitates zero-shot transfer learning. Our second application is in AI-driven scientific discovery, where we embed vertical reasoning to expedite symbolic regression. Vertical reasoning builds from reduced models that involve a subset of variables (or processes) to full models, inspired by human scientific approach. Demonstrated in computational materials science, vertical discovery outperforms horizontal ones at discovering equations involving many variables and complex processes.

Bio: Dr. Yexiang Xue is an assistant professor in the Department of Computer Science, Purdue University. The goal of Dr. Xue’s research is to bridge large-scale constraint-based reasoning with state-of-the-art machine learning techniques in order to enable intelligent agents to make optimal decisions in high-dimensional and uncertain real-world applications. More specifically, Dr Xue’s research focuses on scalable and accurate probabilistic reasoning techniques, statistical modeling of data, and robust decision-making under uncertainty. His work is motivated by key problems across multiple scientific domains, ranging from artificial intelligence, machine learning, renewable energy, materials science, crowdsourcing, citizen science, urban computing, ecology, to behavioral econometrics. Recently, Dr. Xue has been focusing on developing cross-cutting computational methods, with an emphasis in the areas of computational sustainability and AI-driven scientific discovery.

3:00 pm to 3.30 pm

Lightning  Talks

3:30 pm to 4.00 pm

Coffee Break

4.00 pm to 5.00 pm

Poster Session

Day 2: Feb 21

9:00 am to 9.30 am

Invited Talk by Jacob Zwart
Title: How much knowledge-guidance is needed? Insights from deep learning for water resources

Abstract: Scientific knowledge can be integrated into deep learning models at various stages of model development, such as during input variable preparation, pre-training technique selection, utilization of process-relevant architectures, or custom loss function design to uphold specific physical or biological principles. However, determining the extent of guidance required by these models and identifying instances where scientific input might impede model performance are crucial questions. How do we know when additional scientific guidance is beneficial versus when the model should discover patterns for itself? In this presentation, I'll showcase several applications of knowledge-guided deep learning models in addressing water resource challenges, each with different levels of scientific input incorporated during model development. I'll discuss modeling decisions made at the U.S. Geological Survey, along with techniques for model interrogation aimed at informing our choices regarding the appropriate balance between providing more guidance and allowing the models to discover for themselves.

Bio: Jacob Zwart works within the Data Science Branch of the Water Resources Mission Area to develop aquatic ecosystem modeling techniques that provide timely information to stakeholders about important water resources across the nation. He uses his expertise in computational modeling, data assimilation, and limnology to help produce short-term forecasts of water quality at regional scales to aid in water resources decision making. Jacob’s research themes are: 1) improve understanding of aquatic biogeochemical processes and predicting how these processes may respond to future global change, 2) develop techniques to inject scientific knowledge into machine learning models to make accurate predictions of environmental variables (also known as “knowledge-guided machine learning”), and 3) advance methods for assimilating real-time observations into knowledge-guided machine learning models to improve near-term forecasts of water quality. Jacob also serves as a Peer Support Worker at USGS promoting awareness and education on topics and USGS policies for antiharassment, discrimination, biases, and scientific integrity, as well as providing peer-to-peer support for USGS employees.

9:30 am to 10:00 am

Invited Talk by Kai-Hendrik Cohrs
Title: Towards Inference in Hybrid Earth System Models

Abstract: Earth System Models (ESMs) play a significant role in understanding and projecting the human impact on the Earth’s climate. To the surprise of the modelers, recent advancements in the field have led to an increase instead of a decrease in uncertainty around global temperature projections of the ESM. This is being attributed to key processes that remain poorly understood. The integration with machine learning (ML) made possible by abundant Earth observation data aims to mitigate these deficiencies. One important knowledge-guided machine learning approach for integrating physical systems and ML is hybrid modeling. In this talk, we will examine persisting challenges and see two alternative methods to the end-to-end deep learning approach in hybrid modeling applied to problems around carbon fluxes. The first is based on double-machine learning, a technique for causal effect estimation that allows robust inference in the presence of regularization bias. The second method is based on the Bayesian machine scientist, a Bayesian symbolic regression algorithm we deploy in the hybrid modeling fashion. Finally, we discuss challenges and the next steps toward inference in more complex coupled systems.

Bio: Kai-Hendrik Cohrs is an ELLIS Ph.D. student in the Image and Signal Processing (ISP) group at the University of Valencia, specializing in machine learning for Earth and climate sciences. He holds a Bachelor's and Master's degree in Mathematics from the University of Göttingen. His research interests lie in Bayesian inference, causality, deep learning, and the integration of prior knowledge into machine learning models. For his doctoral thesis, he focuses on inference in hybrid Earth system models.

10:00 am to 10.30 am

Invited Talk by Aryan Deshwal
Title: AI to Accelerate Scientific Discovery and Engineering Design

Abstract: Searching for a new scientific discovery or an engineering design can be frequently cast as an instantiation of the following general challenge: adaptive optimization of complex design spaces guided by expensive experiments. For example, searching the space of materials for a desired property while minimizing the total resource-cost of physical lab experiments for their evaluation. In this talk, I will present my work on developing novel adaptive experimental design algorithms to address this challenge and applying them to solve high-impact science and engineering applications in nano-porous materials design, electronic design automation and additive manufacturing.

Bio: Aryan Deshwal is a final year PhD candidate at Washington State University. His research agenda is AI to Accelerate Scientific Discovery and Engineering Design where he focuses on advancing foundations of AI/ML to solve challenging real-world problems with high societal impact. He is selected for Rising Stars in AI by KAUST AI Initiative (2023) and Heidelberg Laureate Forum (2022). He won the College of Engineering Outstanding Dissertation Award (2023), Outstanding Research Assistant Award (2022), and Outstanding Teaching Assistant in CS Award (2020) from WSU. He won outstanding reviewer awards from ICML (2020), ICLR (2021), and ICML (2021) conferences. 

10:30 am to 11:00 am

Coffee Break

11:00 am to 12:30 pm

Panel Discussion
Panelists (Tentative): Khemraj Shukla, Aryan Deshwal, Jacob Zwart, Kai-Hendrik Cohrs

Moderator: Nikhil Muralidhar

12:30 pm to 2:00 pm

Lunch Break

2:00 pm to 5:30 pm

Tutorial on Physics-informed Neural Networks by Khemraj Shukla and George Karniadakis (EVENT CANCELLED. MATERIALS WILL BE UPLOADED AFTER PROGRAM ENDS.)
Title: A primer on Physics informed neural networks and Neural Operators:  Theory, Algorithms and Implementation

Abstract: During this 3-hour tutorial, we will allocate 90 minutes to lectures, with the remaining half dedicated to hands-on demonstrations of implementing Physics-Informed Neural Networks (PINNs) and DeepONet. The tutorial will begin with imparting the fundamental framework of PINNs, including insights on enhancing PINN training through various adaptive strategies. We will delve into different variations of PINNs, such as gPINN, cPINN, and XPINN, tailored for a variety of Partial Differential Equations (PDEs). Subsequently, we will explore the methodology for learning operators, elucidating the underlying mathematics behind Deep Operator Network (DeepONet) and Fourier Neural Operator (FNO). In the latter part of the tutorial, we will guide participants through the implementation of these concepts using TensorFlow, PyTorch, and JAX. There will be a 30 minute coffee break from 10.30 am to 11 am.

Bio: George Em Karniadakis (h-index 129) is the lead PI and the contact person. He is member of the National Academy of Engineering and a Vannevar Bush faculty fellow. He is Professor of  Applied Mathematics and Engineering at Brown University. He has been a DoD grantee since the early 1990s, developing spectral/hp element methods for compressible flows, discontinuous Galerkin methods, and generalized polynomial chaos methods for uncertainty quantification. He is the author of four books on spectral elements for CFD, microfluidics, parallel scientific computing, and stochastic PDEs, and more than 400 research articles. In the past he has led several large-scale projects of DoD, DOE and NIH. For example, he has been the lead PI on three different MURIs in the last 15 years. He has received the SIAM/ACM CSE award (2021), the SIAM Ralph Kleinman award (2015), the J.T. Oden (inaugural) medal (2013) and the CFD award (2007) by USACM.

Khemraj Shukla is Associate Professor of Applied Mathematics at Brown University. He held an appointment as a Research Scientist at Hewlett-Packard (HP) Labs, CA, BP America, TX and Halliburton, CO. His research focuses on the development of scalable codes on heterogeneous computing architecture. He has also worked as a Lecturer/Computational Scientist at University of Chicago. As a doctoral student he developed weight-adjusted discontinuous Galerkin method with penalty fluxes targeted specifically for GPU architecture for the wave propagation in a fluid saturated porous medium under the guidance of Prof. Maarten V. de Hoop and Prof. Jesse Chan of Rice University.

Organizing Committee

Nikhil Muralidhar
Stevens Institute of Technology nmurali1@stevens.edu


Arka Daw
Oak Ridge National Laboratory
dawa@ornl.gov

Ramakrishnan Kannan
Oak Ridge National Laboratory
kannanr@ornl.gov

Vipin Kumar
University of Minnesota
kumar001@umn.edu