Call for Paper and Submission guidlines

The workshop will provide a common platform to discuss the recent progress, challenges, and opportunities in developing transformer-based models for various computer vision applications. To this end, we welcome transformer-based original research contributions in the following topics (but are not limited to):

• Theoretical insights into transformer-based models

• Transformer models for spatial (image) and temporal (video) data modeling

• Efficient transformer architectures, including novel mechanisms for self-attention and non-local attention.

• Visualizing and interpreting transformer networks

• Generative models for transformer networks

• Hybrid network designs combining the strengths of transformer models with convolutional and graph-based models

• Unsupervised, weakly, and semi-supervised learning with transformer models

• Multi-modal learning combining visual data with text, speech, and knowledge graphs

• Prompt tuning and selection for large-scale multimodal models

• Leveraging multi-spectral data like satellite imagery and infrared images in transformer models for improved semantic understanding of visual content

• Transformer-based designs for low-level vision problems such as image super- resolution, deblurring, de-raining, and denoising

• Novel transformer-based methods for high-level vision problems such as object detection, segmentation, activity recognition, and pose estimation

• Transformer models for volumetric, mesh, and point-cloud data processing in 3D and 4D data settings

Submission Guidelines

Call for paper: pdf

Format: All the submissions should follow the instructions adapted for ICCV 2023.

Page Limit: 8 pages

Submission Site: CMT