Tool-Augmented VIsion (TAVI) Workshop at CVPR 2024
Recent vision-language models (VLMs), such as CLIP, Flamingo, PaLI, have shown strong capabilities to memorize large amounts of world knowledge when scaled to tens of billions of parameters and trained on web-scale data. Although these models have achieved remarkable results across various benchmarks, they tend to struggle on tasks that require 1) ) seeking the answers from external sources 2) long-tail knowledge and 3) fine-grained understanding. Recently, there has been a growing interest in retrieval and tool-augmented models that rely on non-parametric, external knowledge sources to address these limitations. In this inaugural edition of the TAVI workshop, we aim to bring together a diverse group of researchers who will share their recent work on this exciting and increasingly popular topic with our computer vision community.
Topics that will be covered in the workshop
We will cover a variety of topics including but not limited to, applying tool-use and retrieval augmented models to the following problems:
Image and video classification
Dense prediction
Image and video generation
Explainability and reasoning
Data-efficient learning
Multimodal learning
Self-supervised learning
Prompt tuning and selection
Visual instruction tuning
Note: There will be no call for papers for this workshop.
Tentative Schedule - Monday 17th (morning)
Speakers
Cordelia Schmid
Google Research
Carl Vondrick
Columbia University
Aniruddha Kembhavi
Allen Institute for AI
Organizers
Ahmet Iscen
Google Research
Contact: iscen@google.com
Gul Varol
ENPC ParisTech
Pan Lu
UCLA
Ziniu Hu
Caltech / Google Research
Mathilde Caron
Google Research
Alireza Fathi
Google Research
Minsu Cho
POSTECH