Investigating Mechanisms for In-Context Vision Language Binding (Oral)
Darshana Saravanan, Makarand Tapaswi, Vineet Gandhi
Analyzing Hierarchical Structure in Vision Models with Sparse Autoencoders (Oral)
Matthew Lyle Olson, Musashi Hinck, Neale Ratzlaff, Changbai Li, Phillip Howard, Vasudev Lal, Shao-Yen Tseng
Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video (Oral)
Sonia Joseph, Praneet Suresh, Lorenz Hufe, Edward Stevinson, Robert Graham, Yash Vadi, Danilo Bzdok, Sebastian Lapuschkin, Lee Sharkey, Blake Aaron Richards
Disentangling Polysemantic Channels in Convolutional Neural Networks (Oral)
Robin Hesse, Jonas Fischer, Simone Schaub-Meyer, Stefan Roth
Language-Guided Trajectory Traversal in Disentangled Stable Diffusion Latent Space for Factorized Medical Image Generation
Zahra TehraniNasab, Amar Kumar, Tal Arbel
Leveraging Vision-Language Foundation Models to Reveal Hidden Image-Attribute Relationships in Medical Imaging
Amar Kumar, Anita Kriz, Tal Arbel
Wavelet-Based Mechanistic Interpretability of Vision Transformers in a Latent Diffusion Setting
Sophia Abraham, Jonathan Hauenstein, Walter Scheirer
Visualizing and Controlling Cortical Responses Using Voxel-Weighted Activation Maximization
Matthew W. Shinkle, Mark D. Lescroart
Decoding Vision Transformers: the Diffusion Steering Lens
Ryota Takatsuki, Sonia Joseph, Ippei Fujisawa, Ryota Kanai
Embedding Shift Dissection on CLIP: Effects of Augmentations on VLM’s Representation Learning
Ashim Dahal, Saydul Akbar Murad, Nick Rahimi
Uncovering Branch-specialization in InceptionV1 using k sparse autoencoders
Matthew Bozoukov
Naturally Computed Scale Invariance in the Residual Stream of ResNet18
André Longon
Too Late to Recall: The Two-Hop Problem in Multimodal Knowledge Retrieval (Oral)
Constantin Venhoff, Ashkan Khakzar, Sonia Joseph, Philip Torr, Neel Nanda
Line of Sight: On Linear Representations in VLLMs (Oral)
Achyuta Rajaram, Sarah Schwettmann, Jacob Andreas, Arthur Conmy
Revelio: Interpreting and leveraging semantic information in diffusion models (Oral)
Dahye Kim, Xavier Thomas, Deepti Ghadiyaram
Interpretable Generative Models through Post-hoc Concept Bottlenecks
Akshay R. Kulkarni, Ge Yan, Chung-En Sun, Tuomas Oikarinen, Tsui-Wei Weng
Steering CLIP's vision transformer with sparse autoencoders
Sonia Joseph, Praneet Suresh, Ethan Goldfarb, Lorenz Hufe, Yossi Gandelsman, Robert Graham, Danilo Bzdok, Wojciech Samek, Blake Aaron Richards
Interpreting large text-to-image diffusion models with dictionary learning
Stepan Shabalin, Dmitrii Kharlapenko, Ayush Panda, Sheikh Abdur Raheem Ali, Yixiong Hao, Arthur Conmy
Linear Explanations for Individual Neurons
Tuomas Oikarinen, Tsui-Wei Weng
Explaining Low Perception Model Competency with High-Competency Counterfactuals
Sara Michelle Pohland, Claire Tomlin
Patch Explorer: Interpreting Diffusion Models through Interaction
Imke Grabe, Jaden Fiotto Kaufman, Rohit Gandikota, David Bau
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders
Viacheslav Surkov, Chris Wendler, Mikhail Terekhov, Justin Deschenaux, Robert West, Caglar Gulcehre
From Alexnet to Transformers: Measuring the Non-linearity of Deep Neural Networks with Affine Optimal Transport
Quentin Bouniot, Ievgen Redko, Anton Mallasto, Charlotte Laclau, Oliver Struckmeier, Karol Arndt, Markus Heinonen, Ville Kyrki, Samuel Kaski
SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders
Bartosz Cywiński, Kamil Deja
SAE-V: Interpreting Multimodal Models for Enhanced Alignment
Hantao Lou, Changye Li, Jiaming Ji, Yaodong Yang