8:00 - 8:15 Opening remarks
8:15 - 8:50 Daniel Bolya: "Perception Encoder: State-of-the-Art Unified Image-Video CLIP Models with Surprisingly General Features"
8:50 - 9:25 Xinlong Wang: "Unifying multimodal learning at scale"
9:25 - 10:00 Andrea Vedaldi: "Scaling models of geometry"
10:00 - 10:45 Poster Presentation (ExHall D, board #140-179) + Coffee Break
10:45 - 11:20 Tri Dao: "Designing Hardware-efficient Architectures for Sequence Modeling"
11:20 - 11:55 Chen Change Loy: "From Segment Anything Efficiently to Matting Anyone Precisely"
11:55 - 12:30 Ishan Misra: "Foundational models for video generation, editing and personalization"
12:30Â - 12:35 Closing Remarks