Schedule

First session

9:00 - 9:05 Opening remarks

9:05 - 9:35 Keynote 1: Long Chen - The Interplay of Understanding and Generation in Multimodal AI

9:35 - 10:05 Keynote 2: Manling Li - Why is Spatial Understanding Hard for VLMs?

10:05 - 10:35 Keynote 3: Na Zhao - From Perception to Action: Foundation Models for 3D Spatial Intelligence

10:35 - 11:00 Morning Tea break

Second session

11:00 - 11:20 Outstanding Submission Oral Presentation

11:20 - 11:50 Keynote 4: Liwei Wang - Learning from Videos to 3D Spatial Intelligence

11:50 - 12:20 Keynote 5: Yingwei Pan - Multimodal Content Generation: Unleashing Infinite Creative Possibilities for the Future

12:20 - 12:25 Closing remarks

Page updated

Google Sites

Report abuse