June 19, 2022 (CST)

7:50 - 8:00

Gedas Bertasius

8:00 - 8:30

Alexander Kolesnikov (Google Brain)

9:00 - 9:30

Mathilde Caron (Google AI)

9:30 - 10:00

  • Where are my Neighbors? Exploiting Patches Relations in Self-Supervised Vision Transformer

  • Is Large-scale Pre-training always Necessary for Vision Transformers?

  • Self-Supervised Pre-training of Vision Transformers for Dense Prediction Tasks

  • MC-SSL: Towards Multi-Concept Self-Supervised Learning

  • Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning

  • VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

10:00 - 10:30

Coffee Break + Poster Session

10:30 - 11:00

Zhuowen Tu (UCSD)

Transformers for Structural Extraction

11:00 - 11:30

Jean-Baptiste Alayrac (DeepMind)

11:30 - 12:00

Jitendra Malik (UC Berkeley / Meta)

12:00 - 12:30

  • Enabling Faster Vision Transformers via Soft Token Pruning

  • GroupViT: Semantic Segmentation Emerges from Text Supervision

  • Learned Queries for Efficient Local Attention

  • Adversarial Token Attacks on Vision Transformers

  • GradViT: Gradient Inversion of Vision Transformers

  • Visual Attention Emerges from Recurrent Sparse Reconstruction

12:30 - 13:30

Lunch Break

13:30 - 14:00

Arsha Nagrani (Google AI)

14:00 - 14:30

Andrew Jaegle (DeepMind)

14:30 - 15:00

Saining Xie (Meta)

Everything is All You Need: Vision Architectures for the 2020s

15:00 - 15:30

  • NEAT: Neural Attention Fields for End-to-End Autonomous Driving

  • BoxeR: Box-Attention for 2D and 3D Transformers

  • Depth Estimation with Simplified Transformer

  • M2F3D: Mask2Former for 3D Instance Segmentation

  • Helix4D: Online Semantic Segmentation of LiDAR Sequences

16:30 - 17:00

17:00 - 18:00

Poster Session