AAAI 2024 Bridge on
Knowledge-guided ML
Bridging Scientific Knowledge and AI
(KGML-Bridge-AAAI-24)
Held as part of the Bridge Program at AAAI 2024
February 20 to 21, 2024
Room 205, Vancouver Convention Centre – West Building | Vancouver, BC, Canada
Overview
Scientific knowledge-guided machine learning (KGML) is an emerging field of research where scientific knowledge is deeply integrated in ML frameworks to produce solutions that are scientifically grounded, explainable, and likely to generalize on out-of-distribution samples even with limited training data. By using both scientific knowledge and data as complementary sources of introduction in the design, training, and evaluation of ML models, KGML seeks a distinct departure from black-box data-only methods and holds great potential for accelerating scientific discovery in a number of disciplines.
The goal of this bridge is to nurture the community of researchers working at the intersection of ML and scientific areas and shape the vision of the rapidly growing field of KGML. This bridge builds upon the success of three previous symposiums organized on this topic at the AAAI Fall Symposium Series in 2020, 2021, and 2022. See our book on KGML for a coverage of research topics in this field.
New: The slides from our Introductory Tutorial on KGML are now available. Here is the link to download slides
Schedule
Day 1: Feb 20
9:00 am to 9.15 am
Opening Remarks
9:15 am to 10.30 am
Introductory Tutorial on KGML: Part 1 (Overview)
Speaker: Anuj Karpatne
10 am to 10.30 am
Coffee Break
10:30 am to 12.30 pm
Introductory Tutorial on KGML: Part 2 (Case Studies)
Speakers: Nikhil Muralidhar, Ramkrishnan Kannan, Anuj Karpatne
12:30 pm to 2:00 pm
Lunch Break
2:00 pm to 2.30 pm
Invited Talk by Noah Benson
Title: Automated segmentation of the human visual cortex by convolutional neural networks
Abstract: Segmenting brain areas from functional MRI data is an important but difficult task for many neuroscience studies of human vision because the traditional methods of segmentation require extensive time in the scanner conducting experiments, extensive computation time processing the experimental results, and extensive human time interpreting these results and delineating the regions of interest. Automated methods based on the brain's gray-matter anatomy or a combination of anatomy and data from functional MRI experiments can reduce these requirements but are less accurate than experts. Convolutional Neural Networks (CNNs) are powerful tools for automated medical image segmentation. We hypothesize that CNNs can delineate visual area boundaries with high accuracy. We trained U-Net CNNs with ResNet backbones to segment the first three cortical visual areas (V1, V2, and V3) using a dataset of human-labeled maps. Separate CNNs were trained to predict these regions using different combinations of the following input data: (1) anatomical data regarding the brain's gray-matter only, (2) anatomical data about the brain's gray-matter combined with anatomical data about the brain's white-matter from diffusion-weighted imaging, and (3) anatomical data combined with functional data from visual experiments. All CNNs using functional data had cross-validated accuracies that were statistically indistinguishable from the inter-rater reliability of the training dataset (dice coefficient of 92%) while the CNNs lacking functional data had lower but similar accuracies (~75%). Existing methods of segmenting the visual cortex that do not use CNNs had accuracies substantially lower than those of any of the CNNs. These results demonstrate that with current methods and data quality, CNNs can segment brain areas approximately as well as humans. However, segmentations made using the brain's anatomical structure alone are substantially worse than those informed by functional measurements, suggesting that brain structure and brain function are partially independent.
Bio: Dr. Noah C. Benson is a senior data scientist at the University of Washington's eScience Institute where he performs research on the relationship between brain structure and brain function in the human visual cortex. Dr. Benson obtained his Ph.D. from the University of Washington in 2010 before completing a post-doctoral position with Profs. David Brainard and Geoff Aguirre at the University of Pennsylvania (2010–14) then working as a research scientist with Prof. Jonathan Winawer at New York University (2014–20). During this time, Dr. Benson has published numerous papers focused on methods for predicting brain function from brain anatomy and on the relationship between brain anatomy and human vision. His current research agenda focuses on the application of contemporary artificial intelligence tools to these topics. Dr. Benson is also the author of the software library neuropythy, a neuroscience MRI toolkit, and is a co-organizer of the NeuroHackademy, an annual 2-week workshop at the University of Washington designed to bridge data science and neuroscience education.
2:30 pm to 3.00 pm
Invited Talk by Yexiang Xue
Title: Vertical Reasoning Enhanced Learning, Generation and Scientific Discovery
Abstract: Automated reasoning and machine learning are two fundamental pillars of artificial intelligence. Despite much recent progress, building autonomous agents fully integrating reasoning and learning is still beyond reach. This talk presents two cases where integrated vertical reasoning significantly enhances learning. Our first application is in neural generation, where state-of-the-art models struggle to generate pleasing images while satisfying complex specifications. We introduce Spatial Reasoning INtegrated Generator (SPRING). SPRING embeds a spatial reasoning module inside the deep generative network which decides the locations of objects to be generated. Embedding symbolic reasoning into neural generation guarantees constraint satisfaction, offers interpretability, and facilitates zero-shot transfer learning. Our second application is in AI-driven scientific discovery, where we embed vertical reasoning to expedite symbolic regression. Vertical reasoning builds from reduced models that involve a subset of variables (or processes) to full models, inspired by human scientific approach. Demonstrated in computational materials science, vertical discovery outperforms horizontal ones at discovering equations involving many variables and complex processes.
Bio: Dr. Yexiang Xue is an assistant professor in the Department of Computer Science, Purdue University. The goal of Dr. Xue’s research is to bridge large-scale constraint-based reasoning with state-of-the-art machine learning techniques in order to enable intelligent agents to make optimal decisions in high-dimensional and uncertain real-world applications. More specifically, Dr Xue’s research focuses on scalable and accurate probabilistic reasoning techniques, statistical modeling of data, and robust decision-making under uncertainty. His work is motivated by key problems across multiple scientific domains, ranging from artificial intelligence, machine learning, renewable energy, materials science, crowdsourcing, citizen science, urban computing, ecology, to behavioral econometrics. Recently, Dr. Xue has been focusing on developing cross-cutting computational methods, with an emphasis in the areas of computational sustainability and AI-driven scientific discovery.
3:00 pm to 3.30 pm
Lightning Talks
Hen Emuna, Nadav Borenstein, Xin Qian, Hyeonsu Kang, Joel Chan, Aniket Kittur, Dafna Shahaf, "Imitation of Life: A Search Engine for Biologically Inspired Design"
Kai-Hendrik Cohrs, Gherardo Varando, Nuno Carvalhais, Roger Guimera, Markus Reichstein, Gustau Camps-Valls, "Towards Inference in Hybrid Earth System Models"
Nan Jiang, Yexiang Xue, "Racing Control Variable Genetic Programming for Symbolic Regression"
Andreas Grivas, Antonio Vergari, Adam Lopez, "Taming the Sigmoid Bottleneck: Provably Argmaxable Sparse Multi-Label Classification"
Taniya Kapoor, Abhishek Chandra, Daniel M. Tartakovsky, Hongrui Wang, Alfredo Nunez, Rolf Dollevoet, "Neural oscillators for generalization of physics-informed machine learning"
Joseph Giovanelli, Alexander Tornede, Tanja Tornede, Marius Lindauer, "Interactive Hyperparameter Optimization in Multi-Objective Problems via Preference Learning"
Bharat Srikishan, Anika Tabassum, Srikanth Allu, Ramakrishnan Kannan, Nikhil Muralidhar, "Reinforcement Learning as a Parsimonious Alternative to Prediction Cascades: A Case Study on Image Segmentation"
Margot Herin, Patrice Perny, Nataliya Sokolovska, "GAI-Decomposable Utility Models for Multiattribute Decision Making"
YongKyung Oh,Seung Su Kam,Dongyoung Lim,Sungil Kim, "Enhancing Astronomical time series Classification with Neural Stochastic Differential Equations under Irregular Observations"
Athresh Karanam, Saurabh Mathur, Sahil Sidheekh, Sriraam Natarajan, "A Unified Framework for Human-Allied Learning of Probabilistic Circuits"
Sheng Jie Lui, Cheng Xiang, Shonali Krishnaswamy, "KAMEL: Knowledge Aware Medical Entity Linkage to Automate Health Insurance Claims Processing"
3:30 pm to 4.00 pm
Coffee Break
4.00 pm to 5.00 pm
Poster Session
Day 2: Feb 21
9:00 am to 9.30 am
Invited Talk by Jacob Zwart
Title: How much knowledge-guidance is needed? Insights from deep learning for water resources
Title: How much knowledge-guidance is needed? Insights from deep learning for water resources
Abstract: Scientific knowledge can be integrated into deep learning models at various stages of model development, such as during input variable preparation, pre-training technique selection, utilization of process-relevant architectures, or custom loss function design to uphold specific physical or biological principles. However, determining the extent of guidance required by these models and identifying instances where scientific input might impede model performance are crucial questions. How do we know when additional scientific guidance is beneficial versus when the model should discover patterns for itself? In this presentation, I'll showcase several applications of knowledge-guided deep learning models in addressing water resource challenges, each with different levels of scientific input incorporated during model development. I'll discuss modeling decisions made at the U.S. Geological Survey, along with techniques for model interrogation aimed at informing our choices regarding the appropriate balance between providing more guidance and allowing the models to discover for themselves.
Bio: Jacob Zwart works within the Data Science Branch of the Water Resources Mission Area to develop aquatic ecosystem modeling techniques that provide timely information to stakeholders about important water resources across the nation. He uses his expertise in computational modeling, data assimilation, and limnology to help produce short-term forecasts of water quality at regional scales to aid in water resources decision making. Jacob’s research themes are: 1) improve understanding of aquatic biogeochemical processes and predicting how these processes may respond to future global change, 2) develop techniques to inject scientific knowledge into machine learning models to make accurate predictions of environmental variables (also known as “knowledge-guided machine learning”), and 3) advance methods for assimilating real-time observations into knowledge-guided machine learning models to improve near-term forecasts of water quality. Jacob also serves as a Peer Support Worker at USGS promoting awareness and education on topics and USGS policies for antiharassment, discrimination, biases, and scientific integrity, as well as providing peer-to-peer support for USGS employees.
9:30 am to 10:00 am
Abstract: Earth System Models (ESMs) play a significant role in understanding and projecting the human impact on the Earth’s climate. To the surprise of the modelers, recent advancements in the field have led to an increase instead of a decrease in uncertainty around global temperature projections of the ESM. This is being attributed to key processes that remain poorly understood. The integration with machine learning (ML) made possible by abundant Earth observation data aims to mitigate these deficiencies. One important knowledge-guided machine learning approach for integrating physical systems and ML is hybrid modeling. In this talk, we will examine persisting challenges and see two alternative methods to the end-to-end deep learning approach in hybrid modeling applied to problems around carbon fluxes. The first is based on double-machine learning, a technique for causal effect estimation that allows robust inference in the presence of regularization bias. The second method is based on the Bayesian machine scientist, a Bayesian symbolic regression algorithm we deploy in the hybrid modeling fashion. Finally, we discuss challenges and the next steps toward inference in more complex coupled systems.
Bio: Kai-Hendrik Cohrs is an ELLIS Ph.D. student in the Image and Signal Processing (ISP) group at the University of Valencia, specializing in machine learning for Earth and climate sciences. He holds a Bachelor's and Master's degree in Mathematics from the University of Göttingen. His research interests lie in Bayesian inference, causality, deep learning, and the integration of prior knowledge into machine learning models. For his doctoral thesis, he focuses on inference in hybrid Earth system models.
10:00 am to 10.30 am
Abstract: Searching for a new scientific discovery or an engineering design can be frequently cast as an instantiation of the following general challenge: adaptive optimization of complex design spaces guided by expensive experiments. For example, searching the space of materials for a desired property while minimizing the total resource-cost of physical lab experiments for their evaluation. In this talk, I will present my work on developing novel adaptive experimental design algorithms to address this challenge and applying them to solve high-impact science and engineering applications in nano-porous materials design, electronic design automation and additive manufacturing.
Bio: Aryan Deshwal is a final year PhD candidate at Washington State University. His research agenda is AI to Accelerate Scientific Discovery and Engineering Design where he focuses on advancing foundations of AI/ML to solve challenging real-world problems with high societal impact. He is selected for Rising Stars in AI by KAUST AI Initiative (2023) and Heidelberg Laureate Forum (2022). He won the College of Engineering Outstanding Dissertation Award (2023), Outstanding Research Assistant Award (2022), and Outstanding Teaching Assistant in CS Award (2020) from WSU. He won outstanding reviewer awards from ICML (2020), ICLR (2021), and ICML (2021) conferences.
10:30 am to 11:00 am
Coffee Break
11:00 am to 12:30 pm
Panel Discussion
Panelists (Tentative): Khemraj Shukla, Aryan Deshwal, Jacob Zwart, Kai-Hendrik Cohrs
Panelists (Tentative): Khemraj Shukla, Aryan Deshwal, Jacob Zwart, Kai-Hendrik Cohrs
Moderator: Nikhil Muralidhar
12:30 pm to 2:00 pm
Lunch Break
2:00 pm to 5:30 pm
Tutorial on Physics-informed Neural Networks by Khemraj Shukla and George Karniadakis (EVENT CANCELLED. MATERIALS WILL BE UPLOADED AFTER PROGRAM ENDS.)
Title: A primer on Physics informed neural networks and Neural Operators: Theory, Algorithms and Implementation
Title: A primer on Physics informed neural networks and Neural Operators: Theory, Algorithms and Implementation
Abstract: During this 3-hour tutorial, we will allocate 90 minutes to lectures, with the remaining half dedicated to hands-on demonstrations of implementing Physics-Informed Neural Networks (PINNs) and DeepONet. The tutorial will begin with imparting the fundamental framework of PINNs, including insights on enhancing PINN training through various adaptive strategies. We will delve into different variations of PINNs, such as gPINN, cPINN, and XPINN, tailored for a variety of Partial Differential Equations (PDEs). Subsequently, we will explore the methodology for learning operators, elucidating the underlying mathematics behind Deep Operator Network (DeepONet) and Fourier Neural Operator (FNO). In the latter part of the tutorial, we will guide participants through the implementation of these concepts using TensorFlow, PyTorch, and JAX. There will be a 30 minute coffee break from 10.30 am to 11 am.
Bio: George Em Karniadakis (h-index 129) is the lead PI and the contact person. He is member of the National Academy of Engineering and a Vannevar Bush faculty fellow. He is Professor of Applied Mathematics and Engineering at Brown University. He has been a DoD grantee since the early 1990s, developing spectral/hp element methods for compressible flows, discontinuous Galerkin methods, and generalized polynomial chaos methods for uncertainty quantification. He is the author of four books on spectral elements for CFD, microfluidics, parallel scientific computing, and stochastic PDEs, and more than 400 research articles. In the past he has led several large-scale projects of DoD, DOE and NIH. For example, he has been the lead PI on three different MURIs in the last 15 years. He has received the SIAM/ACM CSE award (2021), the SIAM Ralph Kleinman award (2015), the J.T. Oden (inaugural) medal (2013) and the CFD award (2007) by USACM.
Khemraj Shukla is Associate Professor of Applied Mathematics at Brown University. He held an appointment as a Research Scientist at Hewlett-Packard (HP) Labs, CA, BP America, TX and Halliburton, CO. His research focuses on the development of scalable codes on heterogeneous computing architecture. He has also worked as a Lecturer/Computational Scientist at University of Chicago. As a doctoral student he developed weight-adjusted discontinuous Galerkin method with penalty fluxes targeted specifically for GPU architecture for the wave propagation in a fluid saturated porous medium under the guidance of Prof. Maarten V. de Hoop and Prof. Jesse Chan of Rice University.
Organizing Committee
Anuj Karpatne
Virginia Tech
karpatne@vt.edu
Nikhil Muralidhar
Stevens Institute of Technology nmurali1@stevens.edu
Arka Daw
Oak Ridge National Laboratory
dawa@ornl.gov
Ramakrishnan Kannan
Oak Ridge National Laboratory
kannanr@ornl.gov
Vipin Kumar
University of Minnesota
kumar001@umn.edu