AAAI 2024 Bridge on
Knowledge-guided ML
Bridging Scientific Knowledge and AI
(KGML-Bridge-AAAI-24)
Held as part of the Bridge Program at AAAI 2024
February 20 to 21, 2024
Room 205, Vancouver Convention Centre – West Building | Vancouver, BC, Canada
Held as part of the Bridge Program at AAAI 2024
February 20 to 21, 2024
Room 205, Vancouver Convention Centre – West Building | Vancouver, BC, Canada
Scientific knowledge-guided machine learning (KGML) is an emerging field of research where scientific knowledge is deeply integrated in ML frameworks to produce solutions that are scientifically grounded, explainable, and likely to generalize on out-of-distribution samples even with limited training data. By using both scientific knowledge and data as complementary sources of introduction in the design, training, and evaluation of ML models, KGML seeks a distinct departure from black-box data-only methods and holds great potential for accelerating scientific discovery in a number of disciplines.
The goal of this bridge is to nurture the community of researchers working at the intersection of ML and scientific areas and shape the vision of the rapidly growing field of KGML. This bridge builds upon the success of three previous symposiums organized on this topic at the AAAI Fall Symposium Series in 2020, 2021, and 2022. See our book on KGML for a coverage of research topics in this field.
New: The slides from our Introductory Tutorial on KGML are now available. Here is the link to download slides
Day 1: Feb 20
9:00 am to 9.15 am
9:15 am to 10.30 am
Introductory Tutorial on KGML: Part 1 (Overview)
Speaker: Anuj Karpatne
10 am to 10.30 am
10:30 am to 12.30 pm
Introductory Tutorial on KGML: Part 2 (Case Studies)
Speakers: Nikhil Muralidhar, Ramkrishnan Kannan, Anuj Karpatne
12:30 pm to 2:00 pm
2:00 pm to 2.30 pm
Invited Talk by Noah Benson
Title: Automated segmentation of the human visual cortex by convolutional neural networks
Abstract: Segmenting brain areas from functional MRI data is an important but difficult task for many neuroscience studies of human vision because the traditional methods of segmentation require extensive time in the scanner conducting experiments, extensive computation time processing the experimental results, and extensive human time interpreting these results and delineating the regions of interest. Automated methods based on the brain's gray-matter anatomy or a combination of anatomy and data from functional MRI experiments can reduce these requirements but are less accurate than experts. Convolutional Neural Networks (CNNs) are powerful tools for automated medical image segmentation. We hypothesize that CNNs can delineate visual area boundaries with high accuracy. We trained U-Net CNNs with ResNet backbones to segment the first three cortical visual areas (V1, V2, and V3) using a dataset of human-labeled maps. Separate CNNs were trained to predict these regions using different combinations of the following input data: (1) anatomical data regarding the brain's gray-matter only, (2) anatomical data about the brain's gray-matter combined with anatomical data about the brain's white-matter from diffusion-weighted imaging, and (3) anatomical data combined with functional data from visual experiments. All CNNs using functional data had cross-validated accuracies that were statistically indistinguishable from the inter-rater reliability of the training dataset (dice coefficient of 92%) while the CNNs lacking functional data had lower but similar accuracies (~75%). Existing methods of segmenting the visual cortex that do not use CNNs had accuracies substantially lower than those of any of the CNNs. These results demonstrate that with current methods and data quality, CNNs can segment brain areas approximately as well as humans. However, segmentations made using the brain's anatomical structure alone are substantially worse than those informed by functional measurements, suggesting that brain structure and brain function are partially independent.
Bio: Dr. Noah C. Benson is a senior data scientist at the University of Washington's eScience Institute where he performs research on the relationship between brain structure and brain function in the human visual cortex. Dr. Benson obtained his Ph.D. from the University of Washington in 2010 before completing a post-doctoral position with Profs. David Brainard and Geoff Aguirre at the University of Pennsylvania (2010–14) then working as a research scientist with Prof. Jonathan Winawer at New York University (2014–20). During this time, Dr. Benson has published numerous papers focused on methods for predicting brain function from brain anatomy and on the relationship between brain anatomy and human vision. His current research agenda focuses on the application of contemporary artificial intelligence tools to these topics. Dr. Benson is also the author of the software library neuropythy, a neuroscience MRI toolkit, and is a co-organizer of the NeuroHackademy, an annual 2-week workshop at the University of Washington designed to bridge data science and neuroscience education.
2:30 pm to 3.00 pm
Invited Talk by Yexiang Xue
Title: Vertical Reasoning Enhanced Learning, Generation and Scientific Discovery
Abstract: Automated reasoning and machine learning are two fundamental pillars of artificial intelligence. Despite much recent progress, building autonomous agents fully integrating reasoning and learning is still beyond reach. This talk presents two cases where integrated vertical reasoning significantly enhances learning. Our first application is in neural generation, where state-of-the-art models struggle to generate pleasing images while satisfying complex specifications. We introduce Spatial Reasoning INtegrated Generator (SPRING). SPRING embeds a spatial reasoning module inside the deep generative network which decides the locations of objects to be generated. Embedding symbolic reasoning into neural generation guarantees constraint satisfaction, offers interpretability, and facilitates zero-shot transfer learning. Our second application is in AI-driven scientific discovery, where we embed vertical reasoning to expedite symbolic regression. Vertical reasoning builds from reduced models that involve a subset of variables (or processes) to full models, inspired by human scientific approach. Demonstrated in computational materials science, vertical discovery outperforms horizontal ones at discovering equations involving many variables and complex processes.
Bio: Dr. Yexiang Xue is an assistant professor in the Department of Computer Science, Purdue University. The goal of Dr. Xue’s research is to bridge large-scale constraint-based reasoning with state-of-the-art machine learning techniques in order to enable intelligent agents to make optimal decisions in high-dimensional and uncertain real-world applications. More specifically, Dr Xue’s research focuses on scalable and accurate probabilistic reasoning techniques, statistical modeling of data, and robust decision-making under uncertainty. His work is motivated by key problems across multiple scientific domains, ranging from artificial intelligence, machine learning, renewable energy, materials science, crowdsourcing, citizen science, urban computing, ecology, to behavioral econometrics. Recently, Dr. Xue has been focusing on developing cross-cutting computational methods, with an emphasis in the areas of computational sustainability and AI-driven scientific discovery.
3:00 pm to 3.30 pm
Hen Emuna, Nadav Borenstein, Xin Qian, Hyeonsu Kang, Joel Chan, Aniket Kittur, Dafna Shahaf, "Imitation of Life: A Search Engine for Biologically Inspired Design"
Kai-Hendrik Cohrs, Gherardo Varando, Nuno Carvalhais, Roger Guimera, Markus Reichstein, Gustau Camps-Valls, "Towards Inference in Hybrid Earth System Models"
Nan Jiang, Yexiang Xue, "Racing Control Variable Genetic Programming for Symbolic Regression"
Andreas Grivas, Antonio Vergari, Adam Lopez, "Taming the Sigmoid Bottleneck: Provably Argmaxable Sparse Multi-Label Classification"
Taniya Kapoor, Abhishek Chandra, Daniel M. Tartakovsky, Hongrui Wang, Alfredo Nunez, Rolf Dollevoet, "Neural oscillators for generalization of physics-informed machine learning"
Joseph Giovanelli, Alexander Tornede, Tanja Tornede, Marius Lindauer, "Interactive Hyperparameter Optimization in Multi-Objective Problems via Preference Learning"
Bharat Srikishan, Anika Tabassum, Srikanth Allu, Ramakrishnan Kannan, Nikhil Muralidhar, "Reinforcement Learning as a Parsimonious Alternative to Prediction Cascades: A Case Study on Image Segmentation"
Margot Herin, Patrice Perny, Nataliya Sokolovska, "GAI-Decomposable Utility Models for Multiattribute Decision Making"
YongKyung Oh,Seung Su Kam,Dongyoung Lim,Sungil Kim, "Enhancing Astronomical time series Classification with Neural Stochastic Differential Equations under Irregular Observations"
Athresh Karanam, Saurabh Mathur, Sahil Sidheekh, Sriraam Natarajan, "A Unified Framework for Human-Allied Learning of Probabilistic Circuits"
Sheng Jie Lui, Cheng Xiang, Shonali Krishnaswamy, "KAMEL: Knowledge Aware Medical Entity Linkage to Automate Health Insurance Claims Processing"
3:30 pm to 4.00 pm
4.00 pm to 5.00 pm
Day 2: Feb 21
9:00 am to 9.30 am
Abstract: Scientific knowledge can be integrated into deep learning models at various stages of model development, such as during input variable preparation, pre-training technique selection, utilization of process-relevant architectures, or custom loss function design to uphold specific physical or biological principles. However, determining the extent of guidance required by these models and identifying instances where scientific input might impede model performance are crucial questions. How do we know when additional scientific guidance is beneficial versus when the model should discover patterns for itself? In this presentation, I'll showcase several applications of knowledge-guided deep learning models in addressing water resource challenges, each with different levels of scientific input incorporated during model development. I'll discuss modeling decisions made at the U.S. Geological Survey, along with techniques for model interrogation aimed at informing our choices regarding the appropriate balance between providing more guidance and allowing the models to discover for themselves.
Bio: Jacob Zwart works within the Data Science Branch of the Water Resources Mission Area to develop aquatic ecosystem modeling techniques that provide timely information to stakeholders about important water resources across the nation. He uses his expertise in computational modeling, data assimilation, and limnology to help produce short-term forecasts of water quality at regional scales to aid in water resources decision making. Jacob’s research themes are: 1) improve understanding of aquatic biogeochemical processes and predicting how these processes may respond to future global change, 2) develop techniques to inject scientific knowledge into machine learning models to make accurate predictions of environmental variables (also known as “knowledge-guided machine learning”), and 3) advance methods for assimilating real-time observations into knowledge-guided machine learning models to improve near-term forecasts of water quality. Jacob also serves as a Peer Support Worker at USGS promoting awareness and education on topics and USGS policies for antiharassment, discrimination, biases, and scientific integrity, as well as providing peer-to-peer support for USGS employees.
9:30 am to 10:00 am
Abstract: Earth System Models (ESMs) play a significant role in understanding and projecting the human impact on the Earth’s climate. To the surprise of the modelers, recent advancements in the field have led to an increase instead of a decrease in uncertainty around global temperature projections of the ESM. This is being attributed to key processes that remain poorly understood. The integration with machine learning (ML) made possible by abundant Earth observation data aims to mitigate these deficiencies. One important knowledge-guided machine learning approach for integrating physical systems and ML is hybrid modeling. In this talk, we will examine persisting challenges and see two alternative methods to the end-to-end deep learning approach in hybrid modeling applied to problems around carbon fluxes. The first is based on double-machine learning, a technique for causal effect estimation that allows robust inference in the presence of regularization bias. The second method is based on the Bayesian machine scientist, a Bayesian symbolic regression algorithm we deploy in the hybrid modeling fashion. Finally, we discuss challenges and the next steps toward inference in more complex coupled systems.
Bio: Kai-Hendrik Cohrs is an ELLIS Ph.D. student in the Image and Signal Processing (ISP) group at the University of Valencia, specializing in machine learning for Earth and climate sciences. He holds a Bachelor's and Master's degree in Mathematics from the University of Göttingen. His research interests lie in Bayesian inference, causality, deep learning, and the integration of prior knowledge into machine learning models. For his doctoral thesis, he focuses on inference in hybrid Earth system models.
10:00 am to 10.30 am
Abstract: Searching for a new scientific discovery or an engineering design can be frequently cast as an instantiation of the following general challenge: adaptive optimization of complex design spaces guided by expensive experiments. For example, searching the space of materials for a desired property while minimizing the total resource-cost of physical lab experiments for their evaluation. In this talk, I will present my work on developing novel adaptive experimental design algorithms to address this challenge and applying them to solve high-impact science and engineering applications in nano-porous materials design, electronic design automation and additive manufacturing.
Bio: Aryan Deshwal is a final year PhD candidate at Washington State University. His research agenda is AI to Accelerate Scientific Discovery and Engineering Design where he focuses on advancing foundations of AI/ML to solve challenging real-world problems with high societal impact. He is selected for Rising Stars in AI by KAUST AI Initiative (2023) and Heidelberg Laureate Forum (2022). He won the College of Engineering Outstanding Dissertation Award (2023), Outstanding Research Assistant Award (2022), and Outstanding Teaching Assistant in CS Award (2020) from WSU. He won outstanding reviewer awards from ICML (2020), ICLR (2021), and ICML (2021) conferences.
10:30 am to 11:00 am
11:00 am to 12:30 pm
12:30 pm to 2:00 pm
Anuj Karpatne
Virginia Tech
karpatne@vt.edu
Nikhil Muralidhar
Stevens Institute of Technology nmurali1@stevens.edu
Arka Daw
Oak Ridge National Laboratory
dawa@ornl.gov
Ramakrishnan Kannan
Oak Ridge National Laboratory
kannanr@ornl.gov
Vipin Kumar
University of Minnesota
kumar001@umn.edu