Date: Sunday, Sep 29, PM, 2024. Time zone is CET (local time at Milan, Italy).Â
Schedule: The workshop will start on Sep 29th, at 1:40 PM local time
2:00 - 2:05 PM: Opening
2:10 - 2:45 PM: Jason (Yao) Lu: VILA: a journey of building SOTA VLM and beyond
2:50 - 3:25 PM: Ishan Misra: What makes Generative video models tick?
3:30 - 4:05 PM: Ranjay Krishna: Outperforming Proprietary Multimodal Language Models
Break & Poster Session: This will be in the same room.
4:50 - 5:25 PM: Xinlei Chen: Attention is (Almost) All You Need from Pre-Trained Vision Transformers
5:30 - 5:55 PM: Ming-Hsuan Yang: Recent Advances in Multimodal Foundation Models
6:00 PM: Closing Remarks