
Call for papers

In this workshop we welcome critical and diverse perspectives on the larger landscape of pre-training for vision and language models, including both computer vision and multimodal applications. Topics include, but are not limited to:

  • Evaluation of pre-training methods for vision and/or multimodal applications

  • Parallelization regimes that solve unique problems for vision and/or multimodal scenarios

  • Critical evaluation of pre-train and fine-tune regimes for common vision tasks like image classification, object detection, semantic segmentation, etc

  • Exploring the impact of pre-train / fine-tune regimes for autonomous vehicles, biomedical applications, e-commerce applications, media, large scale online search, etc

  • Detecting and mitigating bias in vision and multimodal models

  • Exploring, evaluating, and extending the pre-training regime for modalities beyond vision and text, such as vision and audio, vision and seismic data, vision and gaming data, vision and seismic data, robotics,

  • Efficiency enhancements for pre-training vision / multimodal models, such as compilation

  • Enhancing the application of pre-trained models beyond their original domain and across modalities, such as methods to align vision and text models trained separately

  • Datasets, statistics, theory of pre-training and fine-tuning regimes and methods for computer vision and combined modalities

  • Research and development on topics above and related areas

Papers should be submitted by mid-October, a more firm deadline will be announced shortly.

Submission Guidelines

      • Authors are encouraged to submit high-quality, original (i.e., not been previously published or accepted for publication in substantially similar form in any peer-reviewed venue including journal, conference or workshop) research.

      • All submissions should follow the same template as for the main WACV2023 conference. The author kit/paper template is provided in Latex format via this overleaf template and this github repository.

      • The main paper has an 8-page limit, references do not count toward this. There is no limit on the number of pages in the supplementary material. Only .pdf files are accepted.

      • Unlike the main conference, the review process for this workshop has only one round, and is single-blind. Authors do not have to be anonymized when submitting their work.

      • Authors of accepted papers are required to present their work live, either in-person or remote. We will not accept pre-recorded presentations.

Submission Deadlines

Industry poster track

  • If you are working on an application of large-scale modelling for vision and/or multimodal, and would simply like to submit an abstract, we welcome poster submissions!

Diversity statement

  • This workshop strongly values diverse points of view, organizations, backgrounds, perspectives, and walks of life. Towards that end we have strong representation from five industry organizations and five academic organizations, with multiple participants from diverse backgrounds included.