Program

08:15-08:30am - Poster Setup

08:30-08:40am - Opening Remarks


[Vision + Language Foundation Models]

08:40-09:10am - Invited Talk: Trevor Darrell "Recent advances in LLMs for language and vision”

09:10-09:40am - Invited Talk: Mohamed Elhosein "Imaginative Vision Language Models" (pdf) (pptx)


09:40-10:40am - Poster Session

10:40-11:00am - Coffee Break


[Multimodal Video Foundation Models]

11:00-11:30am - Invited Talk: Kristen Grauman "Goals, Memories, and Summaries from Large-Scale Narrated Video" (pptx)


[3D Foundation Models]

11:30-12:00pm - Invited Talk: Vincent Sitzmann "Towards 3D Representation Learning at Scale" (pdf)

12:00-12:30pm - Invited Talk: Chuang Gan "Visual Commonsense Reasoning with Large Language Models"


12:30pm - Concluding Remarks



Accepted Papers