In Augmented Reality (AR), Virtual Reality (VR), and spatial computing, computer vision connects digital and physical realities. Understanding, generating, and interacting with complex 3D environments pushes immersive technologies forward. Furthermore, integrating the temporal dimension to handle dynamic, evolving scenes (4D) is rapidly emerging as the crucial next frontier.
Past advancements in multimodal foundation models improved image and video processing, creating a solid baseline. The current challenge is translating these successes into robust Multimedia Spatial Intelligence. This involves interpreting and generating rich spatial data (3D) while accounting for its evolution over time (4D). Integrating diverse inputs (text, audio, and video) allows us to seamlessly create, modify, and interact with these spatio-temporal environments.
The fifth edition of the MUSTCV workshop (formerly CV4Metaverse ) explores the mechanics of spatial and dynamic computing, emphasizing 3D spatial intelligence, cross-modal multimedia generation, and temporal dynamicity.
The areas of interest touch upon, but are not confined to, the following subjects:
Spatial and Dynamic Scene Understanding:
Methods for continuous interaction in static 3D and dynamic 4D environments. Includes spatiotemporal modeling (e.g., 3D/4D reconstruction, depth estimation, tracking)
Cross-Modal 3D/4D Generation and Synthesis:
Utilizing text, audio, and video to generate, edit, or manipulate spatial scenes (e.g., text-to3D/4D, audio-driven motion). Bridging 2D foundation models with time-aware generation
Immersive Applications and Datasets:
Novel ML applications for AR/VR, digital twins, and interactive multimedia across 3D/4D domains. New datasets and benchmarks for spatial intelligence and evolving scenes.
Giuseppe Serra University of Udine, Italy
Gianluca Macrì
University of Naples Federico II, University of Udine, Italy
Alex Falcon
University of Udine, Italy
Beatrice Portelli
University of Udine, Italy
Vanessa Sklyarova
Max Planck ETH Center, Switzerland
Barbara Rössle
Technical University of Munich, Germany
Daniel Sungho Jung
Seoul National University, South Korea
Dan Wang
University of California San Diego, USA
Despoina Paschalidou
NVIDIA Toronto AI Lab, Canada
Korea University, South Korea