Under Review | Celia Cintas*, Miriam Rateike*, Erik Miehling, Elizabeth Daly, Skyler Speakman
We present a study on how and where personas -- defined by distinct sets of human characteristics, values, and beliefs -- are encoded in the representation space of large language models (LLMs). Using a range of dimension reduction and pattern recognition methods, we first identify the model layers that show the greatest divergence in encoding these representations. We then analyze the activations within a selected layer to examine how specific personas are encoded relative to others, including their shared and distinct embedding spaces. We find that, across multiple pre-trained decoder-only LLMs, the analyzed personas show large differences in representation space only within the final third of the decoder layers. We observe overlapping activations for specific ethical perspectives -- such as moral nihilism and utilitarianism -- suggesting a degree of polysemy. In contrast, political ideologies like conservatism and liberalism appear to be represented in more distinct regions. These findings help to improve our understanding of how LLMs internally represent information and can inform future efforts in refining the modulation of specific human traits in LLM outputs. Warning: This paper includes potentially offensive sample statements.
Link to paper
FAccT25 | Henrik Nolte*, Miriam Rateike*, Michèle Finck
The EU Artificial Intelligence Act (AIA) establishes legal principles for certain types of AI systems. While prior work has sought to clarify some of these principles, little attention has been paid to robustness and cybersecurity. This pa-per aims to fill this gap. We identify legal challenges in provisions related to robustness and cybersecurity for high-risk AI systems (Art. 15 AIA) and general-purpose AI models (Art. 55 AIA). We demonstrate that robustness and cybersecurity demand resilience against performance disruptions. Furthermore, we assess potential challenges in implementing these provisions in light of recent advancements in the machine learning (ML) literature. Our analysis identifies short-comings in the relevant provisions, informs efforts to develop harmonized standards as well as benchmarks and measurement methodologies under Art. 15(2) AIA, and seeks to bridge the gap between legal terminology and ML research to better align research and implementation efforts in relation to the AIA.
Link to paper
FAccT24 | Miriam Rateike, Isabel Valera, and Patrick Forré
Neglecting the effect that decisions have on individuals (and thus, on the underlying data distribution) when designing algorithmic decision-making policies may increase inequalities and unfairness in the long term - even if fairness considerations were taken in the policy design process. In this paper, we propose a novel framework for achieving long-term group fairness in dynamical systems, in which current decisions may affect an individual's features in the next step, and thus, future decisions. Specifically, our framework allows us to identify a time-independent policy that converges, if deployed, to the targeted fair stationary state of the system in the long term, independently of the initial data distribution.
Link to paper
NeurIPS SoLaR Workshop 2023 | Miriam Rateike, Celia Cintas, John Wamburu, Tanya Akumu and Skyler Speakman.
We propose an auditing method to identify whether a large language model (LLM) encodes patterns such as hallucinations in its internal states, which may propagate to downstream tasks. We introduce a weakly supervised auditing technique using a subset scanning approach to detect anomalous patterns in LLM activations from pre-trained models. Importantly, our method does not need knowledge of the type of patterns a-priori. Instead, it relies on a reference dataset devoid of anomalies during testing. Further, our approach enables the identification of pivotal nodes responsible for encoding these patterns, which may offer crucial insights for fine-tuning specific sub-networks for bias mitigation.
Link to paper
Book Chapter in German: Digital recht Schriften zum Immaterialgüter-, IT-, Medien-, Daten- und Wettbewerbsrecht / Diskriminierungsfreie KI | Miriam Rateike.
The pervasive integration of machine learning (ML) in decision-making processes, particularly within critical domains like health and finance, has heightened the importance of addressing biases and discrimination. Historically, certain ML algorithms have exhibited biases based on so-called sensitive attributes such as gender or race, prompting a growing need for accountability and fairness in algorithmic decision-making. This paper explores the prevalent causes of discrimination throughout the developmental stages of an ML system. Additionally, it provides a small overview of the evolving field of computer science dedicated to ensuring fairness (non-discrimination) and explainability in algorithmic processes. The aim is to contribute to ongoing efforts in reducing biases and fostering transparency within the realm of ML applications impacting diverse groups of individuals.
Link to book (in German)
FAccT 2022 | Miriam Rateike*, Ayan Majumdar*, Olga Mineeva, Krishna P. Gummadi and Isabel Valera.
Novel method for practical fair decision-making based on a variational autoencoder. Our method learns an unbiased data representation leveraging both labeled and unlabeled data and uses the representations to learn a policy in an online process. We show that our training approach not only offers a more stable learning process but also yields policies with higher fairness as well as utility than previous approaches.
Link to paper
NeurIPS 2020 WiML Workshop | Poster presentation of an early version.
AAAI 2022 | Pablo Sanchez Martin*, Miriam Rateike* and Isabel Valera.
Novel class of variational graph autoencoders for causal inference in the absence of hidden confounders, when only observational data and the causal graph are available. Without making any parametric assumptions, VACA mimics the necessary properties of a Structural Causal Model (SCM) to provide a flexible and practical framework for answering interventional and counterfactual queries. VACA can evaluate counterfactual fairness in fair classification problems, and allows to learn fair classifiers without compromising performance.
Link to paper
* Equal contribution.