SECOND Workshop on
Imageomics
(Imageomics-AAAI-25)
Discovering Biological Knowledge from Images using AI
Held as part of AAAI 2025
March 3, 2025, 9:00 am to 5:20 pm
Room 113A, Pennsylvania Convention Center | Philadelphia, PA, United States
Held as part of AAAI 2025
March 3, 2025, 9:00 am to 5:20 pm
Room 113A, Pennsylvania Convention Center | Philadelphia, PA, United States
Imageomics is an emerging field of science that uses images, ranging from microscopic images of single-cell species to videos of charismatic megafauna such as giraffes and zebras, as the source of automatically extracted biological information, specifically traits, in order to gain insights into the evolution of function of living organisms. A central goal of Imageomics is to make traits computable from images, including the morphology, physiology, behavior, and genetic make-up of organisms, by grounding AI models in existing scientific knowledge so that they produce generalizable and biologically meaningful explanations of their predictions. Spearheaded by our prior research and community building efforts in this area as part of the NSF HDR Imageomics Institute, Imageomics is ripe with opportunities to form foundational bridges between AI and biological sciences by enabling computers to see what biologists cannot see while addressing key challenges in AI, creating a virtuous cycle between the two fields.
The goal of this workshop is to nurture the community of researchers working at the intersection of AI and biology and shape the vision of the nascent yet rapidly growing field of Imageomics.
We welcome participation from anyone interested in learning about the field of Imageomics, including (a) biologists working on problems with image data and biological knowledge available such as phylogenies, taxonomic groupings, ontologies, or evolutionary models, and (b) AI researchers working on topics such as explainability, generalizability, inductive bias, open world and fine-grained recognition, foundation models, and novelty detection, who are looking for novel interdisciplinary research problems.
9:00 - 9:10 am
Opening Remarks
9:10 - 9:50 am
Keynote Talk by Ben Koger
Title: Challenges, opportunities, and solutions to studying animals in the contexts of their environments
Abstract: There is growing interest in the ability to extract biological information from imagery. In particular, much of this work has focused on developing methods to efficiently extract information about animals in imagery so that features like physical and behavioral traits can be quantified and metrics like population size and range can be estimated. While important steps, some of this work overlooks that fact that the costs and benefits of animals’ traits and the drivers of population change can only be understood in the context of the dynamic environments that those animals and populations exist in. Luckily, since cameras generally record animals and their local environments simultaneously, imagery is uniquely suited to investigate these dynamics. In this presentation I will discuss the need for methods that integrate animal and environmental data into a unified environmental space, some of the challenges this presents, and successful solutions to these challenges. I will use case studies from my own work ranging from African ungulates to Pacific salmon to pronghorn in the American West.
Bio: Ben Koger is an assistant professor in the School of Computing and the Department of Zoology and Physiology at the University of Wyoming. His work focuses on creating systems that allow for the efficient and automated study of ecological systems. Specifically, combining imaging and computer vision to study the relationship between individuals and their social and physical landscapes. His current research focus is building novel methods to monitor landscapes and animals in the American West and Pacific salmon migration and behavior in Alaska. Previously, he was a Washington Research Foundation Postdoctoral Scholar in the School of Aquatic and Fishery Sciences at the University of Washington working with Professor Andrew Berdahl. During his Ph.D. he worked with Iain Couzin at the Max Planck Institute of Animal Behavior in the Department of Collective Behaviour in Konstanz Germany. He completed his bachelors degree in electrical engineering at Princeton University where he focused on image processing and machine learning.
9:50 - 10:30 am
Abstract: My lab does a lot of work in scientific discovery, and I will discuss two tools that have repeatedly proven extremely valuable for us – dimension reduction for data visualization and interpretable neural networks. Algorithms for dimension reduction (DR) are only effective if they preserve both local and global structure of the data. I will discuss how it is possible to preserve both simultaneously and present the PaCMAP algorithm for DR. PaCMAP and its extension LocalMAP (Wang et al., AAAI 2025) can separate clusters and illuminate manifolds when other methods cannot, making them extremely useful in practice for exploratory data analysis. PaCMAP has won 2 best software awards from the American Statistical Association. On interpretable neural networks, I will discuss the powerful and popular ProtoPNetalgorithm (Chen et al., NeurIPS 2019) for inherently interpretable computer vision. ProtoPNet classifies a test image by comparing parts of that image to parts of prototypical images from each class. (I like to think of it as k-Nearest Neighbors on steroids.) ProtoPNet and its sister algorithms maintain their accuracy at the level of their black box counterparts.
Bio: Cynthia Rudin is the Gilbert, Louis, and Edward Lehrman Distinguished Professor in Computer Science at Duke University. She directs the Interpretable Machine Learning Lab, whose goal is to design predictive models that people can understand. She is the recipient of the $1M 2022 Squirrel AI Award for Artificial Intelligence for the Benefit of Humanity from the Association for the Advancement of Artificial Intelligence as well as the 2024 INFORMS Society on Data Mining Prize. She is also a three-time winner of the INFORMS Innovative Applications in Analytics Award and a 2022 Guggenheim Fellow. She is a member of the US National AI Advisory Committee Subcommittee on AI and Law Enforcement.
10:30 - 11:35 am
Poster Session + Coffee Break
11:35 - 11:50 am
Invited Talk by Hangzhou He
Title: Vision-to-Concept and Language Tokenizers: Learning from Large-Scale Unlabeled Images
Abstract: Deep learning methods have made remarkable progress in both visual understanding and generation tasks. However, the black-box nature of deep neural networks makes them difficult to interpret. In contrast, language serves as an inherently effective interface for explanations. In this presentation, we will share our recently developed approach to constructing a discrete encoder for visual features from large-scale unlabeled images. Specifically, we will demonstrate how to build concept bottleneck models for explainable image classification using our Vision-to-Concept (V2C) tokenizer. Furthermore, by leveraging our Vision-to-Language (V2L) tokenizer, we enable frozen large language models to perform visual tasks such as visual question answering and image restoration. We hope this work will inspire further exploration into bridging vision and language, ultimately advancing interpretable and versatile AI systems across a wider range of domains.
Bio: Hangzhou He is a first-year Ph.D. student at Peking University, China. His research focuses on the intersection of artificial intelligence and biomedical applications, with a particular emphasis on the trustworthiness of deep learning models, including explainability, generalization, and robustness.
11:50 - 12:30 pm
Keynote Talk by Daniel Rubenstein
Title: Animal Behaviour Recognition from Drone Videos: A Methodological Comparison
Abstract: Drones and other remote sensors are accelerating data acquisition in behavioral and ecological studies of animals. These tools offer the promise of increasing the scale and scope of field studies, but they will require partnerships between people and machines to efficiently and accurately distill data in useful ways for answering challenging questions. Field studies of animal behaviour traditionally rely on focal or scan sampling, each with inherent lim2itations. If it were possible to perform simultaneous focal animal sampling of a group of individuals, these limitations would be eliminated since the behaviour of all individuals would be gathered while they were in similar physiological states and experiencing similar ecological and social conditions. Drones make this possibility a reality. From annotated drone footage of plains zebras, Grevy’s zebras, and reticulated giraffes, we developed a machine learning model to detect behavioral states and compute time budgets automatically. This model accurately replicated human-observed time budgets with significantly reduced effort. Applying this system to analyze how Grevy’s zebras adjust to the ’Landscape of Fear’, we found that in larger groups, they reduce per capita vigilance and increase grazing time. When occupying bush-dense habitats, Grevy’s zebras increase herd size but, unlike plains zebras, maintain consistent vigilance levels, suggesting that their fluid social structure enables rapid group size adjustments in response to predation risk. Our methodology and code for automated behavioral analysis from drone footage are publicly available.
Bio: Dan Rubenstein is a behavioral ecologist who studies how environmental variation and individual differences shape social behavior, social structure, sex roles and the dynamics of populations. He has special interests in all species of wild horses, zebras, and asses, and has done field work on them throughout the world identifying rules governing decision-making, the emergence of complex behavioral patterns and how these understandings influence their management and conservation. By working with computer scientists, he is advancing Computational Field Ecology to expand the scale and scope of conservation biology.
Rubenstein is the Class of 1877 Professor of Zoology, Emeritus at Princeton University. He is former Chair of Princeton University's Department of Ecology and Evolutionary Biology and has served as Director of Princeton’s Programs in African Studies and in Environmental Studies. He received his bachelor’s degree from the University of Michigan in 1972 and his Ph.D. from Duke University in 1977 before receiving NSF-NATO and King's College Junior Research Fellowships for post-doctoral studies at Cambridge University. As the Eastman Professor, he spent a year in Oxford as a Fellow of Balliol College. He is a member of the American Academy of Arts and Sciences as well as Fellow of the Animal Behavior Society and the American Association for the Advancement of Science. He has received Princeton University's President's Award for Distinguished Teaching and he has recently completed terms as president of the Animal Behavior Society, the councils of AAAS and the Ecological Society as well as on the board of Sigma Xi, the Scientific Honor Research Society. He has also recently received the Animal Behavior Society’s Exemplar award and the Sigma Xi Honor Society’s McGovern Science & Society Award. He has just been elected as Sigma Xi’s next president as well as council member of Pennington Borough, NJ.
12:30 - 2:00 pm
Lunch
2:10 - 2:50 pm
Keynote Talk by Dimitris Metaxas
Title: Explainability, Generation, Physics and Dynamics in ML for Biomedical and Computer Vision Applications
Abstract: We have been developing a computational learning and AI framework that combines principles of physics-based deformable models, domain knowledge (explainability) and generative methods to augment the performance of pure data driven ML. This framework has been used for resolution of complex dynamic problems in computer vision and biomedical applications. In this presentation we will focus primarily on biomedical applications, and we will present results in cardiac analytics, histopathology and gene identification for cancer, and novel AI/ML methods that use domain knowledge to offer explainability and provide further insights into learning-based decision making and diagnosis. Finally, we will highlight computer vision applications related to human shape, motion and hair estimation, generation and retargeting. We will conclude with future research directions.
Short Bio: Dimitris Metaxas is a Board of Governors and Distinguished Professor in the Computer and Information Sciences Department at Rutgers University. He is directing the Center for Computational Biomedicine, Imaging and Modeling (CBIM) and the NSF University-Industry Collaboration Center CARTA with emphasis on real time and scalable data analytics, AI and machine learning methods with applications to computational biomedicine and computer vision. Dr. Metaxas has been conducting research towards the development of novel methods and technology upon which AI, machine learning, computer vision, medical image analysis, and generative methods can advance synergistically. In medical and biological image analysis new AI, Machine Learning and model-based methods have been developed for material modeling and shape estimation of internal body parts from MRI, SPAMM and CT data, cancer diagnosis, cell segmentation from histopathology images, cell tracking, cell type analysis and linking genetic mutations to cells. Dr. Metaxas has published over 800 research articles in these areas and has graduated over 69 PhD students, who occupy academic and industry positions. His research has been funded by NIH, NSF, AFOSR, ARO, DARPA, HSARPA, and the ONR. Dr. Metaxas work has received many best paper awards and he has 9 patents. He was awarded a Fulbright Fellowship in 1986, is a recipient of an NSF Research Initiation and Career awards, and an ONR YIP. He is a Fellow of the American Institute of Medical and Biological Engineers, a Fellow of IEEE and a Fellow of the MICCAI Society. He will be a General Chair of CVPR 2026, while he has been general chair of IEEE CVPR 2014, Program Chair of ICCV 2007, General Chair of ICCV 2011, FIMH 20011 and MICCAI 2008 and the Senior Program Chair for SCA 2007.
2:50 - 3:30 pm
Keynote Talk by Utkarsh K. Mall
Title: Learning Interpretable Programs for Visual Discovery in Science
Abstract: The interpretability of computer vision and machine learning models is a longstanding challenge. The problem becomes even more critical when such models are applied in scientific domains. Scientists not only seek accurate predictions but also need to understand when models fail. We propose a framework that learns such models as interpretable-by-design programs. Our approach integrates deep neural foundation models that extract intermediate features with symbolic programs. These neuro-symbolic programs combine the power of neural representations with the interpretability of symbolic reasoning. In this talk, I will present two instantiations of this framework. The first focuses on learning interpretable fine-grained visual classifiers with a fixed program structure (LLM-Mutate, ECCV’24). The second is for learning interpretable hypotheses for spatial data such as discovering hypotheses for population density or species distribution mapping (DiSciPLE, CVPR’25). The results show that these learned models maintain accuracy, enhance interpretability, and improve generalization to out-of-distribution data.
Bio: Utkarsh Mall is a postdoctoral research scientist in Computer Science at Columbia University. His research also focuses on building interpretable, reliable, and data-efficient methods to make novel scientific discoveries from visual data. He has also applied this research in various areas such as agriculture, anthropology, archaeology, urban planning, public health, climate science, etc. Before joining Columbia, he earned his PhD from Cornell University, where he worked on building label-efficient foundation models for scientific domains and leveraging them for unsupervised discoveries. He also co-organizes the CVPR workshop: Computer Vision for Science (CV4Science).
3:30 - 4:00 pm
Coffee Break
4:00 - 4:15 pm
Abstract: Accurately describing images with text is a foundation of explainable AI. Vision-Language Models (VLMs) like CLIP have recently addressed this by aligning images and texts in a shared embedding space, expressing semantic similarities between vision and language embeddings. VLM classification can be improved with descriptions generated by Large Language Models (LLMs). However, it is difficult to determine the contribution of actual description semantics, as the performance gain may also stem from a semantic-agnostic ensembling effect, where multiple modified text prompts act as a noisy test-time augmentation for the original one. We propose an alternative evaluation scenario to decide if a performance boost of LLM-generated descriptions is caused by such a noise augmentation effect or rather by genuine description semantics. The proposed scenario avoids noisy test-time augmentation and ensures that genuine, distinctive descriptions cause the performance boost. Furthermore, we propose a training-free method for selecting discriminative descriptions that work independently of classname-ensembling effects. Our approach identifies descriptions that effectively differentiate classes within a local CLIP label neighborhood, improving classification accuracy across seven datasets. Additionally, we provide insights into the explainability of description-based image classification with VLMs.
Bio: Tao Hu is a postdoctoral research fellow at Ommer Lab, working with Björn Ommer. His research focuses on scalable, flexible, and efficient generative models. He received his Ph.D. in Computer Science from the University of Amsterdam in 2023 under the supervision of Cees Snoek. His doctoral research was selected for the CVPR 2023 Doctoral Consortium. He co-organizes the ECCV 2024 Workshop on Audio-Visual Generative Learning and CVPR 2025 eLVM workshop and serves as an Area Chair for the CVPR AI4CC Workshop. He has also been recognized as an Outstanding Reviewer for NeurIPS 2024.
4:15 - 4:30 pm
Invited Talk by Xinyu Geng
Title: ParseCaps: An Interpretable Parsing Capsule Network for Medical Image Diagnosis
Abstract: Deep learning has excelled in medical image classification, but its clinical application is limited by poor interpretability. Capsule networks, known for encoding hierarchical relationships and spatial features, show potential in addressing this issue. Nevertheless, traditional capsule networks often underperform due to their shallow structures, and deeper variants lack hierarchical architectures, thereby compromising interpretability. This paper introduces a novel capsule network, ParseCaps, which utilizes the sparse axial attention routing and parse convolutional capsule layer to form a parse-tree-like structure, enhancing both depth and interpretability. Firstly, sparse axial attention routing optimizes connections between child and parent capsules, as well as emphasizes the weight distribution across instantiation parameters of parent capsules. Secondly, the parse convolutional capsule layer generates capsule predictions aligning with the parse tree. Finally, based on the loss design that is effective whether concept ground truth exists or not, ParseCaps advances interpretability by associating each dimension of the global capsule with a comprehensible concept, thereby facilitating clinician trust and understanding of the model's classification results. Experimental results on CE-MRI, PH^2, and Derm7pt datasets show that ParseCaps not only outperforms other capsule network variants in classification accuracy, redundancy reduction and robustness, but also provides interpretable explanations, regardless of the availability of concept labels.
Bio: Xinyu Geng is currently pursuing an M.S. in Computer Science at Harbin Institute of Technology (Shenzhen). She will join the Hong Kong University of Science and Technology (HKUST) as a Ph.D. candidate under the supervision of Prof. Hao Chen, focusing on computer vision, capsule networks, and model interpretability. She is currently interning at Alibaba Tongyi Lab, researching multimodal large-scale models and Retrieval-Augmented Generation (RAG). As a first-author researcher, she has published papers in top-tier conferences including CVPR and AAAI. She also actively contributes to the academic community by serving as a reviewer for multiple conferences and journals in AI/ML.
4:30 - 5:10 pm
Panel
5:10 - 5:20 pm
Ending Remarks
Dynamic Multi-Modal VAE for Stem Cell Differentiation Forecasting [PDF]
Author: Saad Mohamad
Author: Esha Dasgupta, Boeun Kim, Sang-Hoon Yeo, Hyung Jin Chang
Author: Melane Navaratnarajah, Sophie Martin, David Kelly, Nathan Blake, Hana Chockler
Author: Ziheng Zhang, Jianyang Gu, Arpita Chowdhury, Zheda Mai, David Carlyn, Tanya Berger-wolf, Yu Su, Wei-Lun Chao
The Phantom of the Elytra - Phylogenetic Trait Extraction from Images of Rove Beetles Using Deep Learning - Is the Mask Enough? [PDF] [Poster]
Author: Roberta Hunt, Kim Pedersen
Prior Knowledge Injection into Deep Learning Models Predicting Gene Expression from Whole Slide Images [PDF] [Poster]
Author: Max Hallemeesch, Marija Pizurica, Paloma Rabaey, Olivier Gevaert, Thomas Demeester, Kathleen Marchal
AI-Validated Social Media Data Offers New Perspectives on Biodiversity: A Case Study of Brown Bears (Ursus arctos) in Yellowstone National Park
Author: Nathan Fox
Author: David Breen, Joel Pepper, Jane Greenberg
Static Segmentation by Tracking: A Frustratingly Label-Efficient Approach to Fine-Grained Segmentation [PDF] [Poster]
Author: Zhenyang Feng, Zihe Wang, Saul Ibaven Bueno, Tomasz Frelek, Advikaa Ramesh, Jingyan Bai, Lemeng Wang, Jianyang Gu, Zanming Huang, Tai-yu Pan, Jinsu Yoo, Arpita Chowdhury, Michelle Ramirez, Elizabeth Campolongo, Matthew Thompson, Christopher Lawrence, Sydne Record, Daniel Rubenstein, Neil Rosser, Anuj Karpatne, Hilmar Lapp, Charles Stewart, Tanya Berger-Wolf, Yu Su, Wei-Lun Chao
Author: Arpita Chowdhury, Dipanjyoti Paul, Zheda Mai, Jianyang Gu, Ziheng Zhang, Kazi Sajeed Mehrab, Elizabeth G. Campolongo, Daniel Rubenstein, Charles V. Stewart, Anuj Karpatne, Tanya Berger-Wolf, Yu Su, Wei-Lun Chao
We encourage participation on a broad range of topics that explore AI/ML techniques to understand characteristic patterns of organisms from image or video data. Examples of research questions include (but are not limited to):
What are the types and characteristics of knowledge and data in biology that can be integrated into AI methodologies, and what are the mechanisms for this integration?
How best can new knowledge exposed by ML be translated back into the knowledge corpus of biology?
How best can we inform and catalyze a community of practice to utilize and build upon Imageomics to address grand scientific and societal challenges?
How can foundation models in vision and language impact biology or benefit from biological knowledge?
Paper Submission Deadline: December 6, 2024. 11:59 PM AOE
Acceptance/Rejection Decision: December 16, 2024
Early Registration Deadline: December 19, 2024. 11:59 PM AOE
Camera-Ready Deadline: February 15, 2025, 11:59 PM AOE
We offer an extended deadline for submissions with the same camera-ready deadline! After the camera-ready deadline, the accepted papers will appear on this website for better communication between authors and readers.
Paper Submission Deadline: January 31, 2025, 11:59 PM AOE
Acceptance/Rejection Decision: February 7, 2025
We are accepting short paper submissions for position, review, or research results (2-4 pages, excluding references).
Shorter versions (6 pages, excluding references) of articles in submission or recently accepted at other venues (or presented after Oct. 1, 2024) are acceptable provided they do not violate their dual-submission policy.
An appendix (without a page limit) may be included after the references.
All submissions should be anonymous. All submissions will undergo peer review.
Submissions should be formatted according to the AAAI template (two-column; see Author Kit) and submitted via CMT.
The accepted paper will NOT be archived in AAAI proceedings. This allows the authors to extend their work afterward and submit it to a conference or journal.