In this project, we harness large-scale, richly annotated psychotherapy data to answer a fundamental question in mental health research: Which sequences of therapist interventions and patient self-states tend to foster positive change? This work is carried out in close collaboration with Prof. Dana Atzil-Slonim from Bar-Ilan University, whose research team provides a unique corpus from the Bar-Ilan University Psychology Clinic. In this dataset, each five-minute segment of a session is carefully labeled with one or two self-states, describing the patient’s affect, thoughts, behaviors, and needs, coded in fine-grained subcategories.
By combining these detailed self-state annotations with labeled therapist intervention types, we aim to uncover which combinations and trajectories most consistently support strong therapeutic alliance and better treatment outcomes. To achieve this, we develop unsupervised and weakly supervised machine learning methods to detect typical and atypical state-intervention patterns, discover recurring motifs, and characterize transition dynamics within and across sessions. Our goal is not only to describe these patterns, but to link them empirically to outcome measures—providing an interpretable map of what effective process looks like in real therapy.
Mentalization, or Reflective Functioning, is the human capacity to understand behavior in terms of underlying mental states. It is a cornerstone of successful psychotherapy, yet its identification has traditionally required labor-intensive manual coding by experts. In this project, we develop NLP frameworks to automatically identify and score levels of mentalization within clinical transcripts.
Beyond simple detection, we investigate the dynamic relationship between these scores and treatment outcomes. By mapping how therapist interventions - ranging from supportive to interpretive - interact with a patient’s capacity to mentalize in real-time, we seek to provide a computational lens on how "mentalizing-focused" techniques actually facilitate psychological growth. This work bridges the gap between deep psychodynamic theory and scalable, data-driven clinical supervision.
Traditional clinical research often relies on post-session or post-treatment questionnaires, which can miss the granular "micro-outcomes" occurring within a single meeting. This project utilizes LLMs to automate the prediction of mental states and dyadic measures at the segment level. We analyze the linguistic synchrony and emotional exchange between therapist and patient to provide a high-resolution view of the session’s "pulse."
By focusing on micro-level outcomes—such as changes in patient affect or therapeutic alliance within five-minute windows—we aim to build a "MIND" (Mental Interaction & Nuanced Dynamics) prediction engine. This allows us to track how specific verbal exchanges lead to immediate shifts in the session's quality, offering a powerful new tool for evaluating the efficacy of clinical interventions as they happen.
How can we use large language models not just to process text, but to explore deep questions in human-focused sciences? Probe and Perturb is our lab’s flagship effort to push NLP beyond prediction, transforming it into a scientific tool for simulating controlled experiments inside language representations. Focusing on psychotherapy and mental health, we design techniques to probe the latent space of LLMs and identify directions that align with meaningful human constructs—such as therapist directiveness, patient defensiveness, or emotional openness.
We then perturb these directions to test how subtle shifts change predicted treatment outcomes or interaction dynamics, all within the model’s learned structure. These controlled, in-silico “experiments” allow us to ask causal questions that would be impractical to study directly in clinical settings. By combining robust representation learning, modern interpretability methods, and scalable causal interventions, Probe and Perturb opens a novel empirical paradigm for psychology and social science.
Language is rarely "neutral"; it is steeped in social register and cultural context. In this work, recently submitted for ACL Rolling Review, we investigate how multilingual LLMs represent different linguistic registers, with a specific focus on Slang. Informal language presents a unique challenge for models primarily trained on formal corpora. By analyzing the latent spaces of these models across diverse languages, we track whether "informal" concepts are clustered consistently across linguistic boundaries or if models exhibit inherent biases toward formal registers.
Our findings shed light on the cross-lingual evolution of slang and the degree to which models capture the "vibe" of human communication. This research is essential for developing AI that can truly understand and generate human-like, context-aware dialogue in global settings, ensuring that AI-human interaction respects the sociolinguistic nuances of diverse communities.
As part of our commitment to "white-box" AI, the LoCAV project investigates the fundamental geometry of meaning within Large Language Models. While LLMs are often viewed as high-dimensional "black boxes," we hypothesize that many human-understandable concepts are actually encoded within low-rank subspaces.
In this project, we develop methodologies to identify and isolate these conceptual vectors. By understanding how the model organizes these low-rank directions, we can better interpret how complex ideas—from clinical constructs to social registers—are represented and manipulated by the network. This work serves as a foundational step toward more transparent and steerable AI architectures that align more closely with human conceptual structures.
As part of our commitment to "white-box" AI, the LoCAV project investigates the fundamental geometry of meaning within Large Language Models. While LLMs are often viewed as high-dimensional "black boxes," we hypothesize that many human-understandable concepts are actually encoded within low-rank subspaces.
In this project, we develop methodologies to identify and isolate these conceptual vectors. By understanding how the model organizes these low-rank directions, we can better interpret how complex ideas—from clinical constructs to social registers—are represented and manipulated by the network. This work serves as a foundational step toward more transparent and steerable AI architectures that align more closely with human conceptual structures.