Accepted Papers
Spotlight Presentations
Semantic Vision Transformers. Young Kyung Kim (Duke University); J. Matias Di Martino (Duke University); Guillermo Sapiro(Duke University & Apple) [paper] [poster] [video] [supplementary]
Learning Visual Prompts for Guiding the Attention of Vision Transformers. Razieh Rezaei (UK-Bonn); Masoud Jalili Sabet (Ludwig Maximilian University of Munich); Jindong Gu (University of Oxford); Daniel Rueckert (Technische Universität München); Philip Torr (University of Oxford); Ashkan Khakzar (University of Oxford) [paper] [poster] [video] [supplementary]
GTA: Guided Transfer of Spatial Attention from Object-Centric Representations. SeokHyun Seo (LG CNS); Jinwoo Hong (LG CNS); JungWoo Chae (LG CNS); kyungyul kim (LG CNS); Sangheum Hwang (Seoul National University of Science and Technology) [paper] [poster] [video] [supplementary]
EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens. Sunil Hwang (Korea Military Academy ); Jaehong Yoon (UNC Chapel Hill); Youngwan Lee (ETRI); Sung Ju Hwang (KAIST, AITRICS) [paper] [poster] [video] [supplementary]
ReduceFormer: Attention with Tensor Reduction by Summation. John Yang (NVIDIA); Le An (NVIDIA); Su Inn Park (NVIDIA ) [paper] [poster] [video] [supplementary]
State Space Models for Event Cameras. Nikola Zubic (University of Zurich / ETH Zurich); Mathias Gehrig (University of Zurich); Davide Scaramuzza (University of Zurich, Switzerland) [paper] [poster] [video] [supplementary]
PC-LoRA: Low-Rank Adaptation for Progressive Model Compression with Knowledge Distillation. Injoon Hwang (MODULABS); Hae Won Park (MODULABS); JooYoung Yang (MODULABS); SunJae Maeng (MODULABS); Youngwan Lee (ETRI) [paper] [poster] [video] [supplementary]
Parameter-efficient Active Learning for Foundational models. Athmanarayanan Lakshmi Narayanan (Intel Labs); Ranganath Krishnan (Intel Labs); Amrutha Machireddy (Intel Labs); Mahesh Subedar (Intel) [paper] [poster] [video] [supplementary]
Mask4Former: Mask Transformer for 4D Panoptic Segmentation. Kadir Yilmaz (RWTH Aachen University); Jonas Schult (RWTH Aachen University); Alexey Nekrasov (RWTH Aachen University); Bastian Leibe (RWTH Aachen University) [paper] [poster] [video] [supplementary]
Leveraging Camera Calibration Transformers Model using Line Mixed Queries. Sebastian Janampa (The University of New Mexico); Marios Pattichis (The University of New Mexico ) [paper] [poster] [video] [supplementary]
Poster Presentations
EleViT: exploiting element-wise products for designing efficient and lightweight vision transformers. Uzair Shah (Hamad Bin Khalifa University); Jens Schneider (HBKU); Enrico Gobbetti (CRS4); Giovanni Pintore (CRS4); Mahmood Alzubaidi (HBKU); Mowafa Househ (HBKU); Marco Agus (HBKU) [paper] [poster] [supplementary]
Point-JEPA: A Joint Embedding Predictive Architecture for Self-Supervised Learning on Point Clouds. Ayumu Saito (Saint Mary's University); Jiju Poovvancheri (Saint Mary's University, Halifax) [paper] [poster] [supplementary]
ViT-FS-CAM: Effective and Efficient Visual Explanations of the Vision Transformer with Feature Selection for Image Classification. Luna Zhang (Stony Brook University) [paper] [poster] [supplementary]
ToSA: Token Selective Attention for Efficient Vision Transformers. Manish Kumar Singh (Qualcomm AI Research); Rajeev Yasarla (Qualcomm AI Research); Hong Cai (Qualcomm AI Research); Mingu Lee (Qualcomm AI Research); Fatih Porikli (Qualcomm AI Research) [paper] [poster] [supplementary]
Vision Backbone Enhancement via Multi-Stage Cross-Scale Attention. Liang Shang (University of Wisconsin-Madison); Yanli Liu (OPPO US Research Center); Zhengyang Lou (University of Wisconsin-Madison); Shuxue Quan (Mobile Image Vision LLC); William Sethares (University of Wisconsin-Madison); Nagesh Adluru (WISC); Bochen Guan (OPPO US Research Center) [paper] [poster] [supplementary]
Remembering Transformer for Continual Learning. Yuwei Sun (Araya / RIKEN AIP); Ippei Fujisawa ( Araya ); Jun Sakuma (Tokyo Institute of Technology / RIKEN AIP); Ryota Kanai (Araya) [paper] [poster] [supplementary]
Pushing The Limits of Vision Transformer for Sign Language Recognition with Data Processing and Pre-training. Ganzorig Batnasan (United Arab Emirates University); Hanan Aldarmaki (MBZUAI); Munkh-Erdene Otgonbold (United Arab Emirates University); Qurban Memon (UAE University); Munkhjargal Gochoo (United Arab Emirates University) [paper] [poster] [supplementary]
Multi-Aperture Fusion of Transformer-Convolutional Network (MFTC-Net) for 3D Medical Image Segmentation and Visualization. Siyavash Shabani (university of nevada, Reno); Muhammad Sohaib (University of Nevada, Reno); Sahar Mohammed (University of Nevada, Reno); Bahram Parvin (University of Nevada, Reno) [paper] [poster] [supplementary]
Fusion of regional and sparse attention in Vision Transformers. Nabil Ibtehaz (Purdue University); Ning Yan ( Futurewei Technologies Inc. ); Masood Mortazavi; Daisuke Kihara (Purdue University) [paper] [poster] [supplementary]
Presentation/poster printing instructions
Please follow the poster printing instructions from the CVPR organizers here (note that there are different sites to print a paper for main conference or for workshop, please don't use the main conference for workshop as your printing request may be denied). The maximum poster size is 4x8. In the poster room there will be tables, but no power outlets.