Accepted Papers

Proceedings Track

Investigating Mechanisms for In-Context Vision Language Binding (Oral)

Darshana Saravanan, Makarand Tapaswi, Vineet Gandhi

Analyzing Hierarchical Structure in Vision Models with Sparse Autoencoders (Oral)

Matthew Lyle Olson, Musashi Hinck, Neale Ratzlaff, Changbai Li, Phillip Howard, Vasudev Lal, Shao-Yen Tseng

Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video (Oral)

Sonia Joseph, Praneet Suresh, Lorenz Hufe, Edward Stevinson, Robert Graham, Yash Vadi, Danilo Bzdok, Sebastian Lapuschkin, Lee Sharkey, Blake Aaron Richards

Disentangling Polysemantic Channels in Convolutional Neural Networks (Oral)

Robin Hesse, Jonas Fischer, Simone Schaub-Meyer, Stefan Roth

Language-Guided Trajectory Traversal in Disentangled Stable Diffusion Latent Space for Factorized Medical Image Generation

Zahra TehraniNasab, Amar Kumar, Tal Arbel

Leveraging Vision-Language Foundation Models to Reveal Hidden Image-Attribute Relationships in Medical Imaging

Amar Kumar, Anita Kriz, Tal Arbel

Wavelet-Based Mechanistic Interpretability of Vision Transformers in a Latent Diffusion Setting

Sophia Abraham, Jonathan Hauenstein, Walter Scheirer

Visualizing and Controlling Cortical Responses Using Voxel-Weighted Activation Maximization

Matthew W. Shinkle, Mark D. Lescroart

Decoding Vision Transformers: the Diffusion Steering Lens

Ryota Takatsuki, Sonia Joseph, Ippei Fujisawa, Ryota Kanai

Embedding Shift Dissection on CLIP: Effects of Augmentations on VLM’s Representation Learning

Ashim Dahal, Saydul Akbar Murad, Nick Rahimi

Uncovering Branch-specialization in InceptionV1 using k sparse autoencoders

Matthew Bozoukov

Naturally Computed Scale Invariance in the Residual Stream of ResNet18

André Longon

Non-Proceedings Track

Too Late to Recall: The Two-Hop Problem in Multimodal Knowledge Retrieval (Oral)

Constantin Venhoff, Ashkan Khakzar, Sonia Joseph, Philip Torr, Neel Nanda

Line of Sight: On Linear Representations in VLLMs (Oral)

Achyuta Rajaram, Sarah Schwettmann, Jacob Andreas, Arthur Conmy

Revelio: Interpreting and leveraging semantic information in diffusion models (Oral)

Dahye Kim, Xavier Thomas, Deepti Ghadiyaram

Interpretable Generative Models through Post-hoc Concept Bottlenecks

Akshay R. Kulkarni, Ge Yan, Chung-En Sun, Tuomas Oikarinen, Tsui-Wei Weng

Steering CLIP's vision transformer with sparse autoencoders

Sonia Joseph, Praneet Suresh, Ethan Goldfarb, Lorenz Hufe, Yossi Gandelsman, Robert Graham, Danilo Bzdok, Wojciech Samek, Blake Aaron Richards

Interpreting large text-to-image diffusion models with dictionary learning

Stepan Shabalin, Dmitrii Kharlapenko, Ayush Panda, Sheikh Abdur Raheem Ali, Yixiong Hao, Arthur Conmy

Linear Explanations for Individual Neurons

Tuomas Oikarinen, Tsui-Wei Weng

Explaining Low Perception Model Competency with High-Competency Counterfactuals

Sara Michelle Pohland, Claire Tomlin

Patch Explorer: Interpreting Diffusion Models through Interaction

Imke Grabe, Jaden Fiotto Kaufman, Rohit Gandikota, David Bau

Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders

Viacheslav Surkov, Chris Wendler, Mikhail Terekhov, Justin Deschenaux, Robert West, Caglar Gulcehre

From Alexnet to Transformers: Measuring the Non-linearity of Deep Neural Networks with Affine Optimal Transport

Quentin Bouniot, Ievgen Redko, Anton Mallasto, Charlotte Laclau, Oliver Struckmeier, Karol Arndt, Markus Heinonen, Ville Kyrki, Samuel Kaski

SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders

Bartosz Cywiński, Kamil Deja

SAE-V: Interpreting Multimodal Models for Enhanced Alignment

Hantao Lou, Changye Li, Jiaming Ji, Yaodong Yang

Page updated

Google Sites

Report abuse