June 18, 2023 (PDT)
7:50 - 8:00
Gedas Bertasius
Opening Remarks
8:00 - 8:30
Jianfeng Gao (Microsoft)
From LLMs to Self-Improving AI
8:30 - 9:00
Carl Vondrick (Columbia University)
Connecting Vision and Language via Code
9:00 - 9:30
Saining Xie (NYU)
ConvNet vs Transformer ROUND 2: Self-Supervised Learning and Diffusion Models
9:30 - 10:00
Spotlight Talks 1
Video-kMaX: A Simple Unified Approach for Online and Near-Online Video Panoptic Segmentation
Dual PatchNorm
Point2Vec for Self-Supervised Representation Learning on Point Clouds.
FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer
SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer
10:00 - 10:30
Coffee Break
10:30 - 11:00
Ishan Misra (Meta AI)
Supercharge your Transformers with Self-supervised Learning
11:00 - 11:30
Ruben Villegas (Google DeepMind)
Visual Storytelling with Generative Models of Video
11:30 - 12:00
Alex Kirillov (Meta AI)
Segment Anything
12:00 - 13:30
Lunch Break
13:30 - 14:30
Poster Session
Please put up the posters in Exhibit hall #235 - #264. Location details here
14:30 - 15:00
Poster Session + Coffee Break
15:00 - 15:30
Huiwen Chang (Google)
Masked Modeling for Vision
15:30 - 16:00
Cordelia Schmid (Google)
Multimodal Video Representations and Their Extension to Visual Language Navigation
16:00 - 16:30
Spotlight Talks 2
RePAST: Relative Pose Attention Scene Representation Transformer
OCTraN: 3D Occupancy Convolutional Transformer Network in Unstructured Traffic Scenarios
Clicks as Queries: Interactive Transformer for Multi-instance Segmentation
Joint Adaptive Representations for Image-Language Learning
PaReprop: Fast Parallelized Reversible Backpropagation
16:30 - 17:30
Panel Discussion + Closing Remarks