Second Workshop on Visual Concepts

In conjunction with The Conference on Computer Vision and Pattern Recognition (CVPR) 2025

June 12, 9:00 am - 5:00 pm CDT | 101A, Music City Center, Nashville TN

About

Visual concept discovery aims to extract compact and structured representations of the visual world, and recompose them to tackle novel intricate problems. It has played a crucial role in many core problems in computer vision research, including both discriminative and generative tasks. An important research question is to understand and design concept representations that facilitate better learning from various datasets and compositional reasoning. As an endeavor to answering this question, in this workshop, we gather together researchers in computer vision, multi-modal learning, machine learning, and cognitive science to discuss the following topics:

Representations for learning computational models of visual concepts;
Objectives and sources of learning signals to facilitate visual concept learning, incorporating insights from various scientific fields including machine learning, natural language processing, and cognitive science;
Applications of visual concept learning and reasoning, including but not limited to visual scene understanding, robotics, and controllable image generation;
Interpretability of visual learning systems, delving deeper into how these systems learn, represent, and make use of learned concepts in various application domains.

Invited Speakers

Rinon Gal

NVIDIA

Yossi Gandelsman

UC Berkeley

Manling Li

Northwestern University

Daniel Ritchie

Brown University

Chen Sun

Brown University

Jun-Yan Zhu

Carnegie Mellon University

Schedule

9:00 am CDT — Opening Remarks

9:05 am CDT — Keynote: Daniel Ritchie; Title: Programmatic Generative Visual Concepts; Slides

9:45 am CDT — Keynote: Rinon Gal; Title: Representing Visual Concepts in Text-to-Image Diffusion Models

10:25 am CDT — Oral: Seeing What Tastes Good: Revisiting Multimodal Distributional Semantics in the Billion Parameter Era (Dan Oneata, Desmond Elliott, Stella Frank)

10:35 am CDT — Poster Session (ExHall D, poster boards #230 - #263)

11:35 am CDT — Keynote: Jun-Yan Zhu; Title: Creating Synthetic Data for Text-to-Image Customization

12:15 pm CDT — Lunch Break

1:45 pm CDT — Keynote: Chen Sun; Title: Visual Concepts from the Lens of Vision-Language Models; Slides

2:25 pm CDT — Keynote: Manling Li; Title: Why is Geometric Understanding Hard for VLMs?

3:05 pm CDT — Oral: ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features (Alec Helbling, Tuna Han Salih Meral, Benjamin Hoover, Pinar Yanardag, Duen Horng Chau)

3:15 pm CDT — Oral: Emergence and Evolution of Interpretable Concepts in Diffusion Models Through the Lens of Sparse Autoencoders (Berk Tinaz, Zalan Fabian, Mahdi Soltanolkotabi)

3:25 pm CDT — Coffee Break

3:55 pm CDT — Keynote: Yossi Gandelsman; Title: A Peek Inside Deep Vision Models

4:35 pm CDT — Closing Remarks

Call for Papers

We welcome short paper submissions on the topics above. Papers should follow the CVPR format and be up to 4 pages, excluding references and supplementary material. Any supplementary should be appended to the main PDF for submission. Reviews will be double-blind.

Accepted papers will not be published in proceedings. They will be made publicly available as non-archival reports, allowing future submission to archival venues. Accepted papers will be presented in poster sessions, and selected papers will be invited for spotlight presentations.

Submission deadline: 11:59 pm (Pacific Time), April 15th, 2025

Acceptance notification date: May 6th, 2025

Camera-ready deadline: May 24th, 2025

Submission site: OpenReview