Tool-Augmented VIsion (TAVI) Workshop at CVPR 2024

Recent vision-language models (VLMs), such as CLIP, Flamingo, PaLI,  have shown strong capabilities to memorize large amounts of world knowledge when scaled to tens of billions of parameters and trained on web-scale data. Although these models have achieved remarkable results across various benchmarks, they tend to struggle on tasks that require  1) ) seeking the answers from external sources 2) long-tail knowledge and 3) fine-grained understanding. Recently, there has been a growing interest in retrieval and tool-augmented models that rely on non-parametric, external knowledge sources to address these limitations. In this inaugural edition of the TAVI workshop, we aim to bring together a diverse group of researchers who will share their recent work on this exciting and increasingly popular topic with our computer vision community.

Topics that will be covered in the workshop

We will cover a variety of topics including but not limited to, applying tool-use and retrieval augmented models to the following problems:


Note: There will be no call for papers for this workshop.


Invited Posters (Arch Building Exhibit Hall):

Schedule - Monday 17th (morning), Room: Summit 321

Speakers

Cordelia Schmid
Google Research

Co-author of Scenecraft

Sachit Menon
Columbia University

Co-author of ViperGPT

Aniruddha Kembhavi
Allen Institute for AI

Co-author of VISPROG

Chunyuan Li
Microsoft Research

Co-author of LLAVA

Xuhui Jia
Google DeepMind

Co-author of Instruct-Imagen

Organizers

Ahmet Iscen
Google Research

Contact: iscen@google.com

Gul Varol
ENPC ParisTech

Pan Lu
UCLA

Ziniu Hu
Caltech / Google Research

Mathilde Caron
Google Research

Alireza Fathi
Google Research

Minsu Cho
POSTECH