Vision-based detection and recognition studies have been recently achieving highly accurate performance and were able to bridge the gap between research and real-world applications. Beyond these well-explored detection and recognition capabilities of modern algorithms, vision-based forecasting will likely be one of the next big research topics in the field of computer vision. Vision-based prediction is one of the critical capabilities of humans, and the potential success of automatic vision-based forecasting will empower and unlock human-like capabilities in machines and robots.
One important application is in autonomous driving technologies, where vision-based understanding of a traffic scene and prediction of movement of traffic actors is a critical piece of the autonomous puzzle. Various sensors, such as cameras and lidars, are used as the "eyes" of a vehicle, and advanced vision-based algorithms are required to allow safe and effective driving. Another area where vision-based prediction is used is the medical domain, allowing deep understanding and prediction of future medical conditions of patients. However, despite its potential and relevance for real-world applications, visual forecasting or precognition has not been the focus of new theoretical studies and practical applications as much as detection and recognition problems.
Through the organization of this workshop, we aim to facilitate further discussion and interest within the research community regarding this nascent topic. This workshop will discuss recent approaches and research trends not only in anticipating human behavior from videos but also precognition in multiple other visual applications, such as medical imaging, healthcare, human face aging prediction, early event prediction, autonomous driving forecasting, etc.
In this workshop, the topics of interest include but are not limited to:
Early event prediction
Activity and trajectory forecasting
Multi-agent forecasting
Human behavior and pose prediction
Human face aging prediction
Predicting frames and features in videos and other sensors in autonomous driving
Traffic congestion anomaly prediction
Automated Covid-19 prediction in medical imaging
Visual DeepFake prediction
Short- and long-term prediction and diagnoses in medical imaging
Prediction of agricultural parameters from satellite, drone, and ground imagery
Databases, evaluation, and benchmarking in precognition
This is the seventh Precognition workshop organized at CVPR. It follows very successful workshops organized since 2019, which all featured talks from researchers across a number of industries, insightful presentations, and large attendance. For full programs, slides, posters, and other resources, please visit the 2019, 2020, 2021, 2022, 2023, and 2024 workshop websites.
Paper submission deadline: March 22nd, 2025
Notification to authors: March 30th, 2025
Camera-ready deadline: April 14th, 2025
Video presentation submission: June 1st, 2025
Workshop: June 12th, 2025 (in the afternoon time slot)
Main program location: Room 107A
Time: 1:30 PM – 5:30 PM, June 12th
Poster session location: Exhibit Hall D, poster locations: #216 - #225
Time: 12 PM – 3 PM, June 12th
Detailed program:
12PM to 1:30PM - Poster session (all accepted papers and extended abstracts)
1:30PM - Main program kick-off, Session 1 starts
1:35PM - Invited talk: Tal Hassner, "A Perfect Deepfake Detector: Why It Exists, Why It's Useless, What We Need Instead"
Abstract: The growing prevalence of AI-generated content introduces new risks to privacy and misinformation, while traditional detection methods are rapidly becoming insufficient or obsolete. In this talk, I will explain why conventional deepfake detection is not the answer many believe it to be and present two approaches that look beyond binary detection. The first is model attribution, a "model parsing" technique that can reverse-engineer even future generators from their outputs. The second is media provenance, proactive methods that embed protective markers into images, enabling detection of unauthorized future manipulations. These examples show why forensic approaches to deepfakes represent an important new research frontier for the coming synthetic media age.
2:10PM - Invited talk: Cornelia Caragea, "Improving Semi-Supervised Learning with Pseudo-Margins"
Abstract: In this talk, I will discuss a new semi-supervised learning approach called MarginMatch that combines consistency regularization and pseudo-labeling, with its main novelty arising from the use of unlabeled data training dynamics to measure pseudo-label quality. Instead of using only the model’s confidence on an unlabeled example at an arbitrary iteration to decide if the example should be included in the training or not, our approach also analyzes the behavior of the model on the pseudo-labeled examples as the training progresses to ensure low-quality predictions are masked out. I will show that our approach brings substantial improvements on diverse vision benchmarks, emphasizing the importance of enforcing high-quality pseudo-labels.
2:45PM - Lightning talks
"SRVP: Strong Recollection Video Prediction Model Using Attention-Based Spatiotemporal Correlation Fusion", Yuseon Kim (KISTI), Kyongseok Park (KISTI) [open access] [video] [poster]
"PRIMEDrive-CoT: A Precognitive Chain-of-Thought Framework for Uncertainty-Aware Object Interaction in Driving Scene Scenario", Sriram Mandalika (SRM Institute of Science and Technology, Chennai), Lalitha V (SRM Institute of Science and Technology, Chennai), Athira Nambiar (SRM Institute of Science and Technology, Chennai) [open access] [video] [poster]
"BIMA: Bijective Maximum Likelihood Learning Approach to Hallucination Prediction and Mitigation in Large Vision-Language Models", Huu-Thien Tran (University of Arkansas), Thanh-Dat Truong (University of Arkansas), Khoa Luu (University of Arkansas) [open access] [video] [poster]
3:05PM - Invited talk: Jake Charland, "Robust Perception: Handling the Many Complexities of the World"
Abstract: Perceiving the world around you is a critical task in self-driving vehicles and one that faces many challenges, including scene complexity, adverse weather conditions, and sensor misalignment or loss. Designing a system which is robust to these and many other situations is critical to operating a safe and robust self-driving vehicle. In this talk, I will focus on some of these challenges and present ways in which you can create a self-driving vehicle capable of handling the many complexities of the world. I’ll explain the advantages of using techniques like multi-sensor fusion, multi-view fusion, and finally cover techniques to be robust to sensor misalignment or loss.
3:40PM - Coffee break before Session 2 starts
4:00PM - Invited talk: Antonino Furnari, "Precognition in Egocentric Vision: From Short-Term Interactions to Long-Term Procedural Understanding"
Abstract: Egocentric vision enables AI systems to perceive the physical world as humans do, unlocking the potential to proactively assist the user, enhance their safety, improve interactions with unfamiliar objects and environments, and ultimately support them in daily activities. However, for this vision to become a reality, a crucial component is the ability to anticipate human behaviour. In this talk, I’ll explore how egocentric vision serves as an ideal platform for studying, developing, and ultimately deploying such abilities to benefit tomorrow’s AI assistants. I’ll start by presenting research results from short-term human behaviour anticipation tasks, such as next-active-object prediction and action anticipation. Building on this foundation, I’ll transition to methods targeting long-term future predictions in procedural video understanding and mistake prediction.
4:35PM - Lightning talks
"IGL-DT: Iterative Global-Local Feature Learning with Dual-Teacher Semantic Segmentation Framework under Limited Annotation Scheme", Quan Tran (National Chung Cheng University), Hoang-Thien Nguyen (Posts and Telecommunications Institute of Technology, Ho Chi Minh), Thanh-Huy Nguyen (Université de Bourgogne Europe), Gia-Van To (Institut de Science Financière et d'Assurances), Tien-Huy Nguyen (University of Information Technology), Quan Nguyen (Posts and Telecommunications Institute of Technology, Ha Noi) [open access] [video] [poster]
"HDC: Hierarchical Distillation for Multi-level Noisy Consistency in Semi-Supervised Fetal Ultrasound Segmentation", Tran Quoc Khanh Le (University of Information Technology, Ho Chi Minh City), Nguyen Lan Vi Vu (Ho Chi Minh University of Technology), Ha-Hieu Pham (University of Science, VNU-HCM), Xuan-Loc Huynh (Boston University), Tien-Huy Nguyen (University of Information Technology, Ho Chi Minh City), Minh Huu Nhat Le (International Ph.D. Program in Medicine, College of Medicine, Taipei Medical University, Taipei 110), Quan Nguyen (Posts and Telecommunications Institute of Technology, Hanoi), Hien D. Nguyen (New Mexico State University) [open access] [video] [poster]
"Robust sensor fusion against on-vehicle sensor staleness", Meng Fan (Zoox), Yifan Zuo (Zoox), Patrick Blaes (Zoox), Harley Montgomery (Zoox), Subhasis Das (Zoox) [extended abstract] [video] [poster]
4:50PM - Invited talk: Joshua Manela, "The Art of Debugging End-to-End Behavior Models for Heavy Industrial Machines"
Abstract: The use of learned behavior policies has grown rapidly across both academia and industry, fueled by advances in vision-language models, diffusion policies, and reinforcement learning. As these techniques mature, they are increasingly being applied to real-world systems, including heavy industrial machines. However, debugging and deploying such policies in safety-critical, high-latency environments remains a significant challenge.
In this talk, Joshua Manela will present how Bedrock Robotics tackles these challenges in the context of heavy machinery. From fundamental issues like understanding how model inputs influence outputs, to practical hurdles such as testing in the real world, where failures are costly and simulation fidelity is limited, this talk will highlight Bedrock’s approach to bridging theory and deployment. Topics will include the development of novel open-loop metrics, the role of sim-to-real workflows, and how techniques from adjacent fields like autonomous driving, robotics manipulation, and humanoid control can be adapted for industrial use cases.
5:25PM - Workshop wrap-up
5:30PM - End of workshop
All submitted work will be assessed based on their novelty, technical quality, potential impact, insightfulness, depth, clarity, and reproducibility. For each accepted submission, at least one author must attend the workshop and present the paper. Information about formatting and style files is available here. There are two ways to contribute submissions to the workshop:
Extended abstract submissions are single-blind peer-reviewed, and author names and affiliations should be listed. Extended abstract submissions are limited to a total of four pages (including references). Extended abstracts of already published works can also be submitted. Accepted abstracts will not be included in the printed proceedings of the workshop.
Full paper submissions are double-blind peer-reviewed. The submissions are limited to eight pages, including figures and tables, in the CVPR style. Additional pages containing only cited references are allowed. Accepted papers will be presented in an oral session. All accepted full papers will be published by the CVPR in the workshop proceedings.
Submission website: https://cmt3.research.microsoft.com/Precognition2025
For questions please contact the organizers at precognition.organizers@gmail.com.
Fang-Chieh Chou (DoorDash Research)
Hoang-Quan Nguyen (Univ. of Arkansas)
Mohana Prasad Sathya Moorthy (Apple)
Naga VS Raviteja Chappa (Univ. of Arkansas)
Nicholas Rhinehart (Univ. of Toronto)
Pha Nguyen (Univ. of Arkansas)
Sebastian Lopez-Cot (Aurora Innovation)
Shreyash Pandey (Apple)
Slobodan Vucetic (Temple University)
Thanh-Dat Truong (Univ. of Arkansas)
Vladan Radosavljevic (Spotify)
Yan Xu (Waymo)
It was an in-person and virtual workshop for the paper presentations, the posters, and the talks.
It was an in-person and virtual workshop for the paper presentations, the posters, and the talks.
It was a virtual workshop for the paper presentations, the posters, and the talks. Google generously sponsored to reward the authors of the best paper.
It was a virtual workshop for the paper presentations, the posters, and the talks. Google generously sponsored to reward the authors of the best paper.
It was a virtual workshop for the paper presentations, the posters, and the talks. Uber ATG generously sponsored to reward the authors of the best paper and the best student paper.
There were about 300 attendees for the paper presentations, the posters, and the talks. Uber ATG generously sponsored to reward the authors of the best paper and the best student paper.