Spatial Computing has the potential to redefine the computing paradigm by enabling technology to seamlessly augment human perception and interaction within the natural 3D world. Breakthroughs in hardware—such as Magic Leap, Xreal, and Apple Vision Pro—combined with advances in computer vision, graphics, AI, and human-computer interaction, are creating immersive experiences that blur the boundaries between the physical and digital realms. These systems enable continuous feedback loops, unlocking new forms of engagement and utility.
Artificial Intelligence plays a critical role in elevating AR/VR experiences by powering believable non-player characters (NPCs), dynamic crowd behaviors, adaptive and responsive environments, and richly interactive ecosystems. AI also enables personalized virtual experiences through adaptive storytelling and customized training environments.
This workshop aims to provide a platform for researchers and practitioners to explore the unique challenges and opportunities at the intersection of AI and spatial computing. We invite novel research contributions, system demonstrations, and application-focused works. By bringing together diverse perspectives, the workshop seeks to accelerate progress toward spatially-aware foundation models and intelligent spatial systems capable of continuous perception, comprehension, and reasoning in complex 3D environments.
The topic of this workshop is inherently interdisciplinary, and we welcome submissions from diverse fields such as Computer Vision, Graphics, Artificial Intelligence, Natural Language Processing, Speech and Audio Processing, Multimedia Analysis, and related areas. We particularly encourage contributions that explore novel applications and demonstrate the potential of AI in advancing spatial computing. The following is a non-exhaustive list of suggested topics under focus:
Egocentric AI
3D vision language models
3D scene perception
Multimodal affordance learning
3D spatial grounding
Intuitive physics understanding
Physics informed neural network
Multi-view consistent 3D generation
3D diffusion models
Camera controlled generation
4D simulations
NPCs, animation generation
Gaussian splatting for high-fidelity
Scene reconstruction
SLAM and Semantic SLAM
Scalable 3D reconstruction
Real time 3D/4D mapping from ego-camera
Digital humans, avatar reconstruction
Inverse rendering
Computational holograms
Haptic, touch, olfaction
Collaborative VR
Spatial design for AR overlay
Cognitive guarantee
Text to speech
Surround sound generation
Modelling RIR
Speech translation
Binaural audio generation
Ethical considerations in 3D gen-AI
Alignment with human priors
Uncertainty estimation in 3D prediction
Accountability and governance for always-on AI for AR devices
Supported by TIH, iHub Drishti, IIT Jodhpur