Workshop location: TBD
Poster session location: TBD
8:50 - 17:30 PST
The Transformer architecture has catalyzed a paradigm shift, unifying the fields of computer vision, natural language processing, and beyond. Originally transformative in NLP, its principles now underpin the most powerful foundation models. This includes state-of-the-art models across nearly all vision tasks, including image classification, sophisticated image and video generation, and a new generation of Multimodal LLMs (MLLMs) that seamlessly integrate vision, language, and other sensory inputs. These models are redefining the state-of-the-art in tasks ranging from visual question answering and embodied AI to generative content creation.However, this success has brought new challenges to the forefront. The quadratic complexity of the attention mechanism remains a bottleneck for high-resolution or long-sequence data, leading to excessive computational costs. Furthermore, the field is actively debating the future of visual backbones: Will Transformers continue to scale effectively? Are emerging alternatives, such as State Space Models (SSMs, e.g., Mamba), more efficient successors? How do we optimally design architectures for unified, multimodal understanding? This workshop aims to bring together a diverse set of researchers to share cutting-edge insights, debate the limitations of current models, and explore the next generation of architectures for visual recognition.
MIT / Google DeepMind
Meta FAIR
University of Oxford
UPenn
Carnegie Mellon University
We accept abstract submissions to our workshop. All submissions shall have maximally 4 pages (excluding references) following the CVPR 2026 author guidelines. Accepted papers will NOT appear in IEEE/CVF CVPR workshop proceedings. The papers will be presented at the workshop at the poster and spotlight sessions, and we will make the links to the papers that were previously published available on the workshop’s website.
OpenReview: TBD
Submission Deadline: April 15th, 2026
Notification of Acceptance: May 14th, 2026
Camera-Ready Submission Deadline: May 28th, 2026
Workshop Date: June 4th, 2026
Yan-Bo Lin (UNC)
Han Yi (UNC)
Yue Yang (UNC)
Fuxiao Liu (NVIDIA)
Jaehun Jung (NVIDIA)
Di Zhang (Fudan University)
Ce Zhang (UNC)
Seongsu Ha (UNC)
Ziyang Wang (UNC)
Fu-En (Fred) Yang (NVIDIA)
Guo Chen (Nanjing University)
Tianyi Xiong (University of Maryland, College Park)
Baiqi Li (UNC)
Yulu Pan (UNC)
Le An (NVIDIA)
Ryo Hachiuma (NVIDIA)
Shihao Wang (The Hong Kong Polytechnic University)