We invite submissions to the MIV workshop focusing on the scientific understanding of vision models. Papers can be submitted to either a proceedings or a non-proceedings track.
Proceedings Track Instructions: We welcome original submissions for the proceedings track, which will be published in the CVPR Workshops Proceedings. All accepted papers will be presented as posters and a selected group of papers will be presented as 10-minute spotlight talks. Submitted papers must follow the CVPR 2026 submission format (template link). Submissions to the proceedings track are limited to 4 pages in addition to an appendix.
Use the Proceedings Track submission link to submit your manuscript.
Non-Proceedings Track Instructions: We welcome already published works or papers being presented at CVPR for the non-proceedings track. All accepted papers will be presented as posters and a selected group of papers will be presented as 10-minute spotlight talks. Submitted papers must follow the CVPR 2026 submission format (template link). Non-proceedings submissions are allowed up to 8 pages and an appendix.
Use the Non-Proceedings Track submission link to submit your manuscript.
Submission Instructions: We will be using OpenReview to manage submissions, following a double-blind review process. All submissions must be anonymized. There will be no rebuttal phase. All accepted papers will be presented as posters at the workshop, and a selected set of submissions from both the proceedings and non-proceedings tracks will be chosen for spotlight presentations at the workshop. At least one author must be physically present for a spotlight talk.
Important Dates:
March 2nd AOE - Paper submission deadline on OpenReview :
Proceedings Track submission link,
Non-Proceedings Track submission link
March 15th AOE- Paper Acceptance Notification.
TBD - Camera Ready Deadline.
Areas of interest include but are not limited to:
Visualizing and Understanding Internal Components of Vision and Multimodal Models:
This involves developing methods for visualizing units of vision models such as neurons and attention heads.
Scaling and Automating Interpretability Methods:
How can we scale interpretability methods to larger models and beyond toy datasets for practical applications? This includes developing toolkits and interfaces for practitioners.
Evaluating Interpretability Methods:
This involves developing benchmarks and comparing interpretability methods.
Model Editing and Debiasing:
After developing methods for visualizing and understanding the internals of vision models, how can we causally intervene to change model behavior and make it more safe, less biased, and suited for specific tasks?
Identifying Failure Modes and Correcting Them:
Can we visualize the internals of models to find shortcomings of algorithms or architectures? How can we use these findings to improve design choices?
Emergent Behavior in Vision and Multimodal Models:
Using interpretability techniques, what are intriguing properties that we can discover of large vision and multimodal models? Examples include entanglement of visual and language concepts in CLIP or controllable linear subspaces in diffusion models.
Representation Similarity and Universality:
Several works have found the convergence of representations learned with different model architectures trained with different datasets, with different tasks and modalities. How can we characterize the similarity of these different models.
Understanding Vision Models with Language:
How can we develop methods to use language representations to explain visual representations?
In-Context Learning:
Language models have shown impressive zero-shot capabilities. How can we elicit similar responses from vision models?
Understanding the Role of Data and Model Behavior:
What role does data have on the algorithm? What are biases and properties we can extract from datasets?
Understanding Compositionality and Generalization:
Why do vision models not memorize their data? How do diffusion models generate novel images instead
of reproducing training examples, and what mechanisms allow them to generalize compositionally?
What Vision Models Teach us about the Human Brain:
Do vision models learn circuits that correspond to those in the human brain?