Organizers: Utkarsh Mall (Columbia), Ye Zhu (Princeton), Jacob Berv (UMich), Siavash Golkar (NYU), Katie Bouman (Caltech), Subhransu Maji (UMass Amherst), David Fouhey (NYU)
Date: CVPR 2025, June 11th, All Day
Location: 205 C
Motivation
This workshop aims to: bring together researchers working on computer vision and diverse scientific domains to discuss the latest advancements, challenges, and opportunities at their intersections. The goal is to foster interdisciplinary collaboration, build community within the computer vision community, and highlight progress and researchers at the interface of computer vision and the sciences.
AI advancements have become a transformative force, extending beyond their original domain to drive breakthroughs in scientific discovery—an impact highlighted by the 2024 Nobel Prizes in Physics and Chemistry. Computer vision, as one of the core areas in AI research, offers powerful tools for analyzing data, with applications spanning a wide range of scientific fields, from accelerating discoveries in astrophysics and biology to enhancing environmental monitoring and materials science.
We aim to highlight work in this space and are interested in any topic that covers both computer vision and the sciences:
Computer vision topics in this area often include (but are not limited to): reconstruction, recognition, segmentation and counting, human-in-the loop efforts, low-shot learning, domain adaptation and sim2real, video analysis, joint design of hardware and software.
Science topics include (but are not limited to): astrophysics via a variety of instrument types (radio, light, spectropolarimetry), chemistry, biology, neuroscience, and ecology.
Schedule (Tentative)
9:15 AM
Organizers: Welcome and Opening Remarks
9:30 AM
Abstract: This talk will cover concepts, tools, and methods that make up Vendi Scoring, a new research direction focused on the concept of diversity. I’ll begin by introducing Vendi Scores, a family of diversity metrics rooted in ecology and quantum mechanics, along with their extensions. Next, I’ll discuss algorithms for efficiently searching large materials databases and exploring complex energy landscapes, such as those found in molecular simulations, using Vendi Scores. Finally, I’ll introduce the new concept of 'algorithmic microscopy,' which stems from Vendi Scoring, and describe the Vendiscope, the first algorithmic microscope designed to help scientists zoom in on large data collections for data-driven discovery.
Bio: Dr. Dieng leads the lab Vertaix on research at the intersection of AI and the natural sciences at Princeton University. She is affiliated with the Chemical & Biological Engineering Department, Princeton Materials Institute, Princeton Quantum Initiative, Andlinger Center for Energy and the Environment, and High Meadows Environmental Institute (HMEI). She is a Research Scientist at Google AI and the founder/President of the nonprofit The Africa I Know. She has been recently named an Early-Career Distinguished Presenter at MRS Spring, one of 10 African Scholars to watch in 2025 by The Africa Report, an Outstanding Recent Alumni by Columbia University's Grad School of Arts and Sciences, an AI2050 Early Career Fellow by Schmidt Sciences, and as the Annie T. Randall Innovator of 2022 for her research and advocacy by the American Statistical Association. She received her Ph.D. from Columbia University, receiving recognition such as a Google Ph.D. Fellowship in Machine Learning, a rising star in Machine Learning nomination by the University of Maryland, and a Savage Award from the International Society for Bayesian Analysis. Dieng's research has been covered in media such as the New Scientist and TechXplore. She hails from Kaolack, Senegal.
10:00 AM
10:30 AM
11:00 AM
Abstract: While AI holds tremendous potential to accelerate discovery, a fundamental challenge remains in bridging the gap between general-purpose models and the bespoke scientific questions encountered in individual labs. "Off-the-shelf" solutions often require unique, manually-tuned workflows for new problems. This talk presents our work on building collaborative AI-scientist systems to tackle these challenges. First, we demonstrate how video foundation models, trained on internet-scale data, can be leveraged as powerful perceptual systems to solve domain-specific behavior analysis tasks with minimal fine-tuning. We then address the next bottleneck: how to best use these perceptual outputs. We show how AI agents, powered by large language models, can perform agentic superoptimization—automatically discovering scientific analysis workflows faster and more accurately than human-designed alternatives. Looking ahead, we envision AI agents collaborating with scientists throughout the scientific process to further our understanding of the natural world.
Bio: Jennifer is an assistant professor of computer science at Cornell University. Her group works on building AI systems that collaborate with scientists to accelerate discovery, and she works with experts across fields, including biology, neuroscience, and animal behavior in the lab and in the wild.
11:30 AM
Title: Multimodal Video Foundation Models for Biomedicine and Beyond
Abstract: In this talk, I will present our recent work on developing video foundation models and their applications in biomedicine and scientific discovery. I will begin by discussing how large multimodal models can be leveraged to enable temporal understanding, advanced reasoning capabilities, and agentic frameworks for video analysis. I will then discuss how video foundation models can be applied to surgical video analysis to automate skill assessment and personalized training. Finally, I will introduce a new benchmark designed to rigorously evaluate the scientific reasoning capabilities of multimodal video foundation models.
Bio: Xiaohan Wang is a postdoctoral researcher at Stanford AI Lab, specializing in video understanding, multimodal foundation models, and AI for healthcare. He earned his Ph.D. in Computer Science from the University of Technology, Sydney. Xiaohan has published over 20 top-tier papers in leading venues such as CVPR, ICCV, NeurIPS, and T-PAMI, and has secured four first-place awards in AI competitions. He also serves on the program committees of major AI conferences and has been recognized as an outstanding reviewer at CVPR.
12:00 - 1:30 PM
Lunch & Social
1:30 PM
Abstract: This talk presents an overview of the Merlin Sound ID project, including model architecture, training data, and performance. I’ll show how we use geographic priors to increase performance, and discuss future directions in long audio summarization and visualizing spatial detections along a user’s trajectory.
Bio: Grant is an Assistant Professor in the Manning College of Information and Computer Sciences at the University of Massachusetts Amherst and a visiting researcher at the Cornell Lab of Ornithology. His research focuses on applying machine learning to citizen science platforms such as iNaturalist and Merlin Bird ID, with an emphasis on helping people identify wildlife.
2:00 PM
Afternoon poster lightning talks (Poster ID #16-#33)
2:45 PM
Poster session & coffee break
4:00 PM
Abstract: Once we make a map of the world (e.g., tree cover, crop type), how can we use it to draw inferences? Maps are outputs of complex ML algorithms that take in a variety of remotely sensed variables, and as such, contain biases and non-classical errors. We show how a small amount of randomly sampled ground truth data can correct for bias in remote sensing map products. Applying our method across multiple remote sensing use cases in regression coefficient estimation, we find that it results in estimates that are (1) more reliable than using the map product as if it were 100% accurate and (2) have lower uncertainty than using only the ground truth and ignoring the map product.
Bio: Sherrie Wang is an Assistant Professor at MIT in the Department of Mechanical Engineering and Institute of Data, Systems, and Society. Her research uses novel data and computational algorithms to monitor our planet and enable sustainable development. Her focus is on improving agricultural management and mitigating climate change, especially in low- or middle-income regions of the world. To this end, she frequently uses satellite imagery, crowdsourced data, LiDAR, and other spatial data. Due to the scarcity of ground truth data in these regions and the noisiness of real-world data in general, her methodological work is geared toward developing machine learning methods that work well with these constraints.
4:30 PM
Title: Computer Vision for Biological Science: An Imageomics Perspective
Abstract: Imageomics is an emerging interdisciplinary field focused on understanding the biology of organisms, especially their traits, using visual data. Its central aim is to make traits computable from images, informed by existing scientific knowledge. A key challenge is the sparse, imperfect, and heterogeneous data associated with Earth's vast biodiversity. In this talk, I will share key milestones from my research in Imageomics, leveraging computer vision for biological science. I will highlight approaches that embed biological knowledge into vision models to introduce effective inductive biases and auxiliary supervision, thereby mitigating challenges posed by limited or imperfect data. Specifically, I will present our recent advances in fine-grained trait segmentation, interpretable and generative models for trait identification and discovery, and the development of a vision foundation model for the Tree of Life, BioCLIP. I will conclude with a discussion on the broader implications of Imageomics and future research opportunities in this exciting frontier.
Bio: Wei-Lun (Harry) Chao is an Associate Professor in Computer Science and Engineering at The Ohio State University (OSU). His research focuses on machine learning and computer vision, with applications spanning visual recognition, autonomous driving, biology, and healthcare. He aims to develop fundamental understandings and robust, widely applicable algorithms to tackle real-world challenges. He is particularly interested in learning from imperfect data, including limited, noisy, heterogeneous, distribution-shifting, and inaccessible data. His contributions have been recognized by several awards and honors, including the OSU College of Engineering (CoE) Lumley Research Award (2023) and Lumley Interdisciplinary Research Award (2025), OSU CoE Distinguished Assistant Professorship (2023), OSU Early Career Distinguished Scholar Award (2025), and CVPR Best Student Paper Award (2024). Before joining OSU in 2019, he was a Postdoctoral Associate at Cornell University (2018–2019), working with Kilian Weinberger and Mark Campbell. He earned his Ph.D. in Computer Science from the University of Southern California (2013–2018) under the supervision of Fei Sha.
5:00 PM
Closing Remarks
Confirmed Speakers
Poster List
Format
CV4Science will be a full-day workshop incorporating:
Talks from a set of senior researchers at the interface between computer vision and science, including both computer vision researchers and domain experts
Posters from junior researchers are selected on the basis of an extended abstract. These posters will be given a short spotlight plus a poster.
We will not have long-form workshop papers.
Poster Submission
Due date: April 25, 2025 Anywhere on Earth
Please submit information for your poster here: https://forms.gle/2CpLHMEnCBqGcVSG8
In addition to authors, you will need: (a) a title, (b) brief abstract, and (c) an example figure + caption.