Probing Machine Learning Models in Angluin's Style
KR 2024 Tutorial
Abstract
A major concern when dealing with complex machine learning models, such as language models, is to determine what influences their outcome. This tutorial casts light on Angluin’s exact learning framework and Valiant’s probably approximately correct framework and whether/how they can be employed to systematically probe machine learning models, extracting high level abstractions which can inform about their knowledge, general behaviour, and potentially harmful biases.
Potential Target Audience:
This tutorial is of potential interest to the AI community working on systematic ways to probe machine learning models, so as to investigate their behaviour, potential harmful biases, and potential for knowledge extraction.
Prerequisites:
The tutorial is mostly self-contained. As a prerequisite, we expect a master level background in computer science.
Outline:
Introduction and Motivation (5 min)
Background: Exact Learning, PAC Learning (10 min)
Probing NNs: Extracting Automata (30 min)
Hands-on Activity (30 min)
Pause
Probing LMs: Extracting Horn Expressions (30 min)
Probing LMs: Extracting Decision Trees (30 min)
Conclusion and Discussion (15 min)
Material:
Hands-on DFA Implementation by Mikel Alesha and Montserrat Hermo
Notes (peer-reviewed, to appear in RW 2023 proceedings)