The 6th IEEE/CVF CVPR Precognition Workshop
Seattle, WA
June 18th, 2024
Precognition: Seeing through the Future
in conjunction with
Seattle, June 17th - 21st, 2024
Topics of the Workshop
Vision-based detection and recognition studies have been recently achieving highly accurate performance and were able to bridge the gap between research and real-world applications. Beyond these well-explored detection and recognition capabilities of modern algorithms, vision-based forecasting will likely be one of the next big research topics in the field of computer vision. Vision-based prediction is one of the critical capabilities of humans, and the potential success of automatic vision-based forecasting will empower and unlock human-like capabilities in machines and robots.
One important application is in autonomous driving technologies, where vision-based understanding of a traffic scene and prediction of movement of traffic actors is a critical piece of the autonomous puzzle. Various sensors such as camera and lidar are used as the "eyes" of a vehicle, and advanced vision-based algorithms are required to allow safe and effective driving. Another area where vision-based prediction is used is the medical domain, allowing deep understanding and prediction of future medical conditions of patients. However, despite its potential and relevance for real-world applications, visual forecasting or precognition has not been in the focus of new theoretical studies and practical applications as much as detection and recognition problems.
Through the organization of this workshop we aim to facilitate further discussion and interest within the research community regarding this nascent topic. This workshop will discuss recent approaches and research trends not only in anticipating human behavior from videos but also precognition in multiple other visual applications, such as medical imaging, healthcare, human face aging prediction, early even prediction, autonomous driving forecasting, etc.
In this workshop, the topics of interest include, but are not limited to:
Early event prediction
Activity and trajectory forecasting
Multi-agent forecasting
Human behavior and pose prediction
Human face aging prediction
Predicting frames and features in videos and other sensors in autonomous driving
Traffic congestion anomaly prediction
Automated Covid-19 prediction in medical imaging
Visual DeepFake prediction
Short- and long-term prediction and diagnoses in medical imaging
Prediction of agricultural parameters from satellite, drone, and ground imagery
Databases, evaluation, and benchmarking in precognition
This is the sixth Precognition workshop organized at CVPR. It follows very successful workshops organized since 2019, which all featured talks from researchers across a number of industries, insightful presentations, and large attendance. For full programs, slides, posters, and other resources, please visit the 2019, 2020, 2021, 2022, and 2023 workshop websites.
Important Dates (anywhere on Earth)
Paper submission deadline: March 24, 2024
Notification to authors: April 8, 2024
Camera-ready deadline: April 14, 2024 (EOD Pacific Time)
Video presentation submission: June 2nd, 2024
Workshop: June 18th, 2024 (in the afternoon)
Invited Speakers
Hongyang Li
Research Scientist, OpenDriveLab and Shanghai AILab
Louis Foucard
Long-Range Perception Lead, Aurora Innovation
Xinghui Zhao
Director, School of Engineering and Computer Science, Washington State University, Vancouver
James Zou
Associate Professor of Biomedical Data Science, Stanford University
John Hyatt
Physicist, US Army DEVCOM Army Research Laboratory
Monroe Kennedy III
Director, Assistive Robotics and Manipulation Lab,
Stanford University
Program (times are in PT timezone)
Location: Seattle Convention Center, room: Summit Elliott Bay [map]
Time: 1:30 PM–5:30 PM, June 18th
Program:
12PM to 1:30PM - Poster session (Arch Building Exhibit Hall, poster locations: #433-#437)
“H^3Net: Irregular Posture Detection by Understanding Human Character and Core Structures”, Seungha Noh (Kyonggi University), Kangmin Bae (ETRI), Byoung-Dai Lee (Kyonggi University), Yuseok Bae (ETRI) [open access] [video] [poster]
“CONDA: Continual Unsupervised Domain Adaptation Learning in Visual Perception for Self-Driving Cars”, Thanh-Dat Truong (University of Arkansas), Pierce Helton (University of Arkansas), Ahmed Moustafa (CVIU Lab), Jackson Cothren (University of Arkansas), Khoa Luu (University of Arkansas) [open access]
“VT-Former: An Exploratory Study on Vehicle Trajectory Prediction for Highway Surveillance through Graph Isomorphism and Transformer”, Armin Danesh Pazho (University of North Carolina at Charlotte), Ghazal Alinezhad Noghre (University of North Carolina at Charlotte), Vinit Katariya (University of North Carolina at Charlotte), Hamed Tabkhi (University of North Carolina at Charlotte) [open access] [video] [poster]
"VMRNN: Integrating Vision Mamba and LSTM for Efficient and Accurate Spatiotemporal Forecasting”, Yujin Tang (Hong Kong University of Science and Technology), Peijie Dong (Data Science and Analytics, Hong Kong University of Science and Technology), Zhenheng Tang (Hong Kong Baptist University), Xiaowen Chu (Hong Kong University of Science and Technology), Junwei Liang (Hong Kong University of Science and Technology) [open access] [video] [poster]
“Exploration of Data Augmentation Techniques for Bush Detection in Blueberry Orchards", Boris Culjak (BioSense Institute), Nina Pajevic (BioSense Institute), Vladan Filipovic (BioSense Institute), Dimitrije Stefanovic (BioSense Institute), Zeljana Grbovic (BioSense Institute), Nemanja Djuric (BioSense Institute), Marko Panic (BioSense Institute) [open access] [video] [poster]
1:30PM - Main program kick-off
1:35PM - Invited talk: John Hyatt, "Introduction to ARO + Brief History and Philosophy of Modeling"
Abstract: In this non-technical presentation, I will introduce the Army Research Office and describe potential funding opportunities for high-impact basic research under our 45 programs. I will also briefly discuss the history and philosophy of modeling in classical science as it relates to modern problems in machine learning using the well-known contributions of Tycho Brahe, Johannes Kepler, and Isaac Newton to astronomy. There is a surprising amount of correspondence between the beginning of modern science and the current boom in machine learning, and looking at modern problems through this lens can help put them in perspective.
2:10PM - Invited talk: Monroe Kennedy III, "Egocentric Scene-aware Human Trajectory Prediction"
Abstract: Wearable robots that are capable of forecasting the motion of a human wearer can be useful in alerting the human if the predicted motion may be unsafe. Additionally, such a wearable that can forecast the walking motion of a human in a cluttered environment can also inform a lower-limb exoskeleton control when transitioning between terrain types. In this talk, we will discuss a first step toward predicting human motion in a cluttered environment using a torso-mounted vision-based sensor system. Assuming a static environment, we can use a front-facing camera view to build a representative local map of an environment, leverage visual semantics to understand scene context, and then leverage this scene and context in a diffusion model to forecast human walking motion given data-driven examples. In a cluttered environment, there are often multiple paths a person can take, and a model that can understand and represent the likelihood of taking a particular path compared to other options can make uncertainty-informed decisions based on the quality of the forecast.
2:45PM - Invited talk: Louis Foucard, "Seeing Further: The Challenge of Long-Range Perception"
Abstract: Ensuring accurate 3D detections over extended distances is pivotal for the safe operation of autonomous trucks. Considering the potential load of up to 80,000 lbs and challenging road conditions like wet or icy surfaces, trucks might require a stopping distance of nearly 230m. Furthermore, to ensure naturalistic driving, autonomous trucks need to be able to detect and react to objects far beyond these distances. In this talk, we propose to go over the challenges of long-range perception: sensor sparsity, sensor calibration, computational cost, and label quality. We introduce SpotNet: a fast, single-stage, image-centric but LiDAR-anchored approach for long-range 3D object detection. We demonstrate that our approach to LiDAR/image sensor fusion, combined with the joint learning of 2D and 3D detection tasks, can lead to accurate 3D object detection with very sparse LiDAR support. We argue that such an architecture is ideally suited to leverage each sensor’s strength, i.e. semantic understanding from images and accurate range finding from LiDAR data, while keeping a computational cost independent of range. Finally, we show how learned sensor calibration can improve long-range detection performances at very little computational cost.
3:20PM - Coffee break
3:40PM - Invited talk: James Zou, "New approaches to multi-modal AI agents for science"
Abstract: I will present Dragonfly, our new architecture for a large visual-language model that leverages multi-resolution zoom to achieve state-of-the-art performance across several medical tasks. I will also discuss new works on how to design and optimize AI agents for science.
4:15PM - Invited talk: Xinghui Zhao, "Learning with Limitations - Supporting Big Data Analytics on Resource-Constrained Devices"
Abstract: Today big data has become the key challenge in virtually every area of human endeavor. Cyber-physical systems (CPS) are closely related to big data in nature. These systems couple their cyber and physical parts to provide mission-critical services, such as automated pervasive healthcare, smart civil infrastructures, and autonomous driving systems, among others. The CPS applications interact with human beings or physical environments and continuously generate large amounts of data, which require data analytics and machine learning techniques to process. Due to the nature of these applications, in addition to the prediction accuracy, there are often requirements on system scalability, security, and efficiency, which present challenges in resource-constrained environments, such as mobile and edge devices. In this talk, I will introduce our research work in using machine learning to address several key issues in various cyber-physical systems, such as power systems, transportation systems, and automated health monitoring systems. In addition, I will also present our recent work in developing QoS-aware deep learning frameworks for supporting big data analytics with bounded resources.
4:50PM - Invited talk: Hongyang Li, "Predicting the Future by World Models: The series work of ViDAR for Autonomous Driving"
Abstract: In this talk, we will cover a deep dive of the recent advances in visual autonomous driving - given vision camera input only and predict the perception, prediction and/or planning results respectively or as a whole. One of the key challenges to build a robust and generalizable driving system is the capability to predict the future environment (frames) given current and past history. Previous attempts have introduced spatial-temporal philosophy with many novel modifications. In this talk, we argue that the system could be leveraged further by the aid of world models - a recent trending topic in a variety of applications (Robotics, Autonomous Driving, General Computer Vision). Our recent work has shown that by utilizing world models given the predicted or imagined action and interacting with the environment, one could obtain a non-trivial representation learning for the network’s backbone, equipping a geometric-aware knowledge into the vision system. Such a design has witnessed a unanimous gain of multiple tasks experimentally. We will use the recent work, e.g., ViDAR from OpenDriveLab as an example and point out some future directions, hopefully shedding some light on the ViDAR lineup work for the community.
5:25PM - Workshop wrap-up
5:30PM - End of workshop
Submission Instructions
All submitted work will be assessed based on their novelty, technical quality, potential impact, insightfulness, depth, clarity, and reproducibility. For each accepted submission, at least one author must attend the workshop and present the paper. Information about formatting and style files is available here. There are two ways to contribute submissions to the workshop:
Extended abstract submissions are single-blind peer-reviewed, and author names and affiliations should be listed. Extended abstract submissions are limited to a total of four pages (including references). Extended abstracts of already published works can also be submitted. Accepted abstracts will not be included in the printed proceedings of the workshop.
Full paper submissions are double-blind peer-reviewed. The submissions are limited to eight pages, including figures and tables, in the CVPR style. Additional pages containing only cited references are allowed. Accepted papers will be presented in an oral session. All accepted full papers will be published by the CVPR in the workshop proceedings.
Submission website: https://cmt3.research.microsoft.com/Precognition2024
Organizers
For questions please contact the organizers at precognition.organizers@gmail.com.
Program Committee
Abhishek Mohta (Aurora Innovation)
Apoorv Singh (Motional)
Echo Hanzhang Hu (Aurora Innovation
Fang-Chieh Chou (DoorDash Labs)
Meng Fan (Aurora Innovation)
Mohana Moorthy
Joshua Manela (Waymo)
Kha Gia Quach (PDActive Inc.)
Rowan McAllister (Waymo)
Sebastian Lopez-Cot (Aurora Innovation)
Shivam Gautam (Latitude AI)
Tanmay Agarwal (Cruise)
Vladan Radosavljevic (Spotify)
Yan Xu (CMU)
Zhaoen Su (Meta)
It was an in-person and virtual workshop for the paper presentations, the posters, and the talks.
It was a virtual workshop for the paper presentations, the posters and the talks. Google generously sponsored to reward the authors of the best paper.
It was a virtual workshop for the paper presentations, the posters and the talks. Google generously sponsored to reward the authors of the best paper.
It was a virtual workshop for the paper presentations, the posters and the talks. Uber ATG generously sponsored to reward the authors of the best paper and the best student paper.
There were about 300 attendants for the paper presentations, the posters and the talks. Uber ATG generously sponsored to reward the authors of the best paper and the best student paper.