Scene reconstruction from unposed 2D images is a long-standing, multi-disciplinary research area that spans computer vision, graphics, and photogrammetry. There exist many significant implications from this area that can benefit society, e.g., in autonomous navigation, augmented reality, smart cities, disaster relief planning, etc. On a high level, the main problem can be described into two parts: camera calibration and dense 3D reconstruction. In recent years, these two areas have seen significant interest and progress, particular with dense reconstruction.
The goal of this full-day WACV2025 workshop is to to examine the progress of 3D scene reconstruction/rendering, its underlying assumptions, and the challenges that lie ahead in unconstrained and large scale scenarios. To this end, ULTRRA will release a series of datasets with carefully measured ground-truth. These datasets are crafted to emphasize the challenges in camera calibration and 3D reconstruction/rendering. A challenge will be held to quantitatively and qualitatively evaluate current and novel SoTA’s performances.
A real-time visualization of some of the scenes can be access through the following barcode (password: wriva), courtesy to Accenture Federal Services and Deva Ramanan's group at CMU!
The ULTRRA Challenge dataset is out at https://ieee-dataport.org/competitions/ultrra-challenge-2025!
Multi Camera Pose Estimation and Calibration
Robust feature matching and Structure-from-Motion (SfM) methods
Sparse camera calibration
Relative pose estimation
Camera calibration with occlusion and/or little overlap
Photo-realistic 3D Reconstruction and Novel View Synthesis
Volumetric and surface 3D reconstruction and view synthesis
Sparse view reconstruction and view synthesis
Scene reconstruction and view synthesis from varying appearances, degraded images, transient occlusions, etc.
Large scale reconstruction and view synthesis, e.g., from a large spatial area, multiple elevations, etc.
Digital surface reconstruction from satellite imagery
Photo realistic and accurate novel view synthesis and walk-through
We accept either 4-page extended abstracts or 8-page full papers submissions, excluding reference. The workshop papers are non-archival and we welcome submissions that were already submitted/accepted to other venues or the WACV main coference. All submissions should follow the WACV 2025 author guidelines.
Submission Portal: CMT
Paper Submission Deadline: 12/16/2024
Notification to Authors: 12/23/2024
Camera-ready submission: 01/05/2024
Accepted papers will be invited for poster/oral presentation and will be displayed on the workshop website.
The ULTRRA challenge evaluates current and novel state of the art view synthesis methods for unposed cameras. Challenge datasets will emphasize real-world considerations, such as image sparsity, variety of camera models, and unconstrained acquisition in real-world environments.
Tentative Schedule
Development dataset release: 11/8/2024 (Available Now @ https://ieee-dataport.org/competitions/ultrra-challenge-2025!)
Challenge dataset release: 1/10/2025 (Available Now @ https://ieee-dataport.org/competitions/ultrra-challenge-2025!)
Submission period: 11/1/2024 to 2/14/2025 (tentative)
Winners presentation: 2/28/2025
Dataset
Images collected for the IARPA WRIVA program will be publicly released and made available for use in this public challenge and more broadly to encourage research in view synthesis methods for real-world environments and heterogeneous cameras. Datasets include images collected from mobile phones and other ground-level cameras, security cameras, and airborne cameras. Each camera is calibrated using structure from motion constrained by RTK-corrected GPS coordinates, with accuracies measured in centimeters, for either camera locations or ground control points, depending on the camera. Cameras are geolocated to enable reliable evaluation. Images used for final evaluations will be sequestered.
The data has been collected and is in public release review. The development datasets have already been constructed. Challenge datasets are in work and will be like the development datasets but using sequestered images for evaluation. We anticipate posting the development datasets on IEEE DataPort November 1st 2024 and the challenge datasets January 10th 2025.
Submission Evaluation
The competition will be hosted on CodaBench (https://www.codabench.org/) or an equivalent platform. Challenge tracks will explore multi-camera pose estimation and photo-realistic novel view synthesis. Challenge datasets will include input images from a variety of ground-level, security, and airborne cameras. Camera poses will be evaluated by comparing relative camera locations, with contestant coordinate frames aligned to reference coordinates using Procrustes analysis. Sequestered images and contestant rendered images will be compared using the DreamSim image similarity metric (https://dreamsim-nights.github.io/) to establish view synthesis scores.
Cheng Peng/Myron Brown
Ashwini Deshpande is a Program Manager at IARPA. She focuses on areas of scientific research that includes computer vision, machine learning, and image processing. Ashwini is presently leading IARPA’s efforts on the WRIVA program that aims to develop algorithmic systems to create photorealistic, navigable site models using a highly limited corpus of imagery.
Prior to joining IARPA, Mrs. Deshpande worked at the National Geospatial Intelligence Agency from 2014 to 2019 where she served as a Systems Integrator and Technical Advisor in the Office of Special Programs and Research Directorate and assisted with the launching and management of new programs in Analytic Automation, Image and Video Processing, and Radar.
Zexiang Xu is the Vice President of Research and Development at Hillbot, where he mainly works on multimodal foundation models for robotics and embodied AI. Prior to joining Hillbot, he was a research scientist at Adobe Research, working on neural 3D, 3D large models, and GenAI foundation models. Before that, he obtained my Ph.D. at University of California, San Diego, advised by Prof. Ravi Ramamoorthi.
His research lies at the intersection of computer vision, computer graphics, machine learning, and AI foundation models. His previous work has broadly covered 3D reconstruction, 3D generation, neural representations, view synthesis, relighting, appearance modeling, and appearance acquisition.
Noah Snavely is a Professor of Computer Science at Cornell Tech interested in computer vision and computer graphics, and a member of the Cornell Graphics and Vision Group. He also works at Google DeepMind in NYC. His research interests are in computer vision and graphics, in particular in 3D understanding and depiction of scenes from images. Noah is the recipient of a PECASE, a Microsoft New Faculty Fellowship, an Alfred P. Sloan Fellowship, a SIGGRAPH Significant New Researcher Award, and is a Fellow of the ACM.
Yohann Cabon has been a research scientist at Naver Labs Europe since 2017. Starting as a research engineer, he has gained considerable experience working on a wide range of 3D-related topics. Among many others, he has worked on synthetic dataset generation (Procedural Human Action Videos, Virtual KITTI) and visual localization (image retrieval, NAVI, kapture toolbox). He is now leading research on image-to-3D reconstruction within the CroCo/DUSt3R/MASt3R framework.
Deva Ramanan is a Professor in the Robotics Institute at Carnegie-Mellon University and the former director of the CMU Center for Autonomous Vehicle Research. His research interests span computer vision and machine learning, with a focus on visual recognition. He was awarded the David Marr Prize in 2009, the PASCAL VOC Lifetime Achievement Prize in 2010, the IEEE PAMI Young Researcher Award in 2012, named one of Popular Science's Brilliant 10 researchers in 2012, named a National Academy of Sciences Kavli Fellow in 2013, won the Longuet-Higgins Prize in 2018 for fundamental contributions in computer vision, and was recognized for best paper finalist / honorable mention awards in CVPR 2019, ECCV 2020, and ICCV 2021.
Invited Talks:
2nd place, both tracks: Niluthpol Mithun and Supun Samarasekera, SRI International
1st place, camera calibration track: Khiem Vuong, Carnegie Mellon University
3rd place, view synthesis track: Junyoung Hong, CIPLAB, Yonsei University
1st place, view synthesis track: Marc Bosch, Accenture Federal Services
Talks may be downloaded here:
https://drive.google.com/drive/folders/1B41QBmxCXsY3cEE30xwpJsDqhdWZXMHy?usp=drive_link
Final leaderboard scores:
See the workshop paper for a description of the challenge tracks, test datasets, and metrics.
For inquires/questions/information about the workshop, please email cpeng26@jh.edu
Johns Hopkins University
Johns Hopkins University, Applied Physics Lab
Rama Chellappa (Johns Hopkins University)
Vishal Patel (Johns Hopkins University)
Rongjun Qin (The Ohio State University)
Yajie Zhao (USC)
Rakesh Kumar (SRI)
Marc Bosch (Accenture)
Dan Crispell (VSI)
This work was supported by the Intelligence Advanced Research Projects Activity (IARPA) contract no. 2020-20081800401. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA or the U.S. Government.
This work was supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior/ Interior Business Center (DOI/IBC) contract number 140D0423C0076. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DOI/IBC, or the U.S. Government