Seminars are organized as part of the PhD course; attendance to the seminars is open to everyone. Guest speakers will join by remote.
🧑🏫 Classroom: B203, DIAG Department (Via Ariosto 25).
November 5 2024, 11-12 AM
Abstract: In this presentation, I will provide an overview of the interpretability research landscape and describe various promising methods for exploring and controlling the inner mechanisms of generative language models. I will focus specifically on post-hoc attribution technique and their usage to identify relevant input and model components, showcasing their usage with our Inseq open-source toolkit. A practical application of attribution techniques will be presented with the PECoRe data-driven framework for context usage attribution and its adaptation to produce internals-based citations for model answers in retrieval-augmented generation settings (MIRAGE).
November 7 2024, 11-12 AM
Abstract: Deep learning models are inherently opaque, making it difficult to understand their decision-making processes. Post-hoc explainable AI (XAI) methods aim to offer explanations for these models, but such explanations are often brittle and do not provide experts with reliable ways to intervene or adjust the trained models. Interpretability by design seeks to address this issue by building models that maintain the same predictive performance as opaque models, but are directly understandable without relying on post-hoc methods.