Visual concept discovery aims to extract compact and structured representations of the visual world, and recompose them to tackle novel intricate problems. It has played a crucial role in many core problems in computer vision research, including both discriminative and generative tasks. An important research question is to understand and design concept representations that facilitate better learning from various datasets and compositional reasoning. As an endeavor to answering this question, in this workshop, we gather together researchers in computer vision, multi-modal learning, machine learning, and cognitive science to discuss the following topics:
Representations for learning computational models of visual concepts;
Objectives and sources of learning signals to facilitate visual concept learning, incorporating insights from various scientific fields including machine learning, natural language processing, and cognitive science;
Applications of visual concept learning and reasoning, including but not limited to visual scene understanding, robotics, and controllable image generation;
Interpretability of visual learning systems, delving deeper into how these systems learn, represent, and make use of learned concepts in various application domains.
NVIDIA
UC Berkeley
Northwestern University
Brown University
Brown University
Carnegie Mellon University
9:00 am CDT — Opening Remarks
9:05 am CDT — Keynote: Daniel Ritchie; Title: Programmatic Generative Visual Concepts; Slides
9:45 am CDT — Keynote: Rinon Gal; Title: Representing Visual Concepts in Text-to-Image Diffusion Models
10:25 am CDT — Oral: Seeing What Tastes Good: Revisiting Multimodal Distributional Semantics in the Billion Parameter Era (Dan Oneata, Desmond Elliott, Stella Frank)
10:35 am CDT — Poster Session (ExHall D, poster boards #230 - #263)
11:35 am CDT — Keynote: Jun-Yan Zhu; Title: Creating Synthetic Data for Text-to-Image Customization
12:15 pm CDT — Lunch Break
1:45 pm CDT — Keynote: Chen Sun; Title: Visual Concepts from the Lens of Vision-Language Models; Slides
2:25 pm CDT — Keynote: Manling Li; Title: Why is Geometric Understanding Hard for VLMs?
3:05 pm CDT — Oral: ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features (Alec Helbling, Tuna Han Salih Meral, Benjamin Hoover, Pinar Yanardag, Duen Horng Chau)
3:15 pm CDT — Oral: Emergence and Evolution of Interpretable Concepts in Diffusion Models Through the Lens of Sparse Autoencoders (Berk Tinaz, Zalan Fabian, Mahdi Soltanolkotabi)
3:25 pm CDT — Coffee Break
3:55 pm CDT — Keynote: Yossi Gandelsman; Title: A Peek Inside Deep Vision Models
4:35 pm CDT — Closing Remarks
We welcome short paper submissions on the topics above. Papers should follow the CVPR format and be up to 4 pages, excluding references and supplementary material. Any supplementary should be appended to the main PDF for submission. Reviews will be double-blind.
Accepted papers will not be published in proceedings. They will be made publicly available as non-archival reports, allowing future submission to archival venues. Accepted papers will be presented in poster sessions, and selected papers will be invited for spotlight presentations.
Submission deadline: 11:59 pm (Pacific Time), April 15th, 2025
Acceptance notification date: May 6th, 2025
Camera-ready deadline: May 24th, 2025
Submission site: OpenReview
Brown University
UC Berkeley
Contact: yzzhang@cs.stanford.edu