How can we use large language models not just to process text, but to explore deep questions in human-focused sciences? Probe and Perturb is our lab’s flagship effort to push NLP beyond prediction, transforming it into a scientific tool for simulating controlled experiments inside language representations.
Focusing on psychotherapy and mental health, we design techniques to probe the latent space of LLMs and identify directions that align with meaningful human constructs — such as therapist directiveness, patient defensiveness, or emotional openness. We then perturb these directions to test how subtle shifts change predicted treatment outcomes or interaction dynamics, all within the model’s learned structure.
These controlled, in-silico “experiments” allow us to ask causal questions that would be impractical to study directly in clinical settings. For example, how might a different therapist approach influence a patient’s self-disclosure trajectory?
By combining robust representation learning, modern interpretability methods, and scalable causal interventions, Probe and Perturb opens a novel empirical paradigm for psychology and social science. Our goal is to help researchers and clinicians understand how words shape mental states and relationships — with large language models serving as a safe, testable laboratory for human-centered science.
In this project, we harness large-scale, richly annotated psychotherapy data to answer a fundamental question in mental health research: Which sequences of therapist interventions and patient self-states tend to foster positive change?
This work is carried out in close collaboration with Prof. Dana Atzil-Slonim from Bar-Ilan University, whose research team provides a unique corpus from the Bar-Ilan University Psychology Clinic. In this dataset, each five-minute segment of a session is carefully labeled with one or two self-states, describing the patient’s affect, thoughts, behaviors, and needs, coded fine-grained subcategories.
By combining these detailed self-state annotations with labelled therapist intervention types, we aim to uncover which combinations and trajectories most consistently support strong therapeutic alliance and better treatment outcomes.
To achieve this, we develop unsupervised and weakly supervised machine learning methods to detect typical and atypical state-intervention patterns, discover recurring motifs, and characterize transition dynamics within and across sessions. Our goal is not only to describe these patterns, but to link them empirically to outcome measures — providing an interpretable map of what effective process looks like in real therapy.
This project blends computational sequence modeling with deep clinical insight, paving the way for more precise research, improved training, and personalized psychotherapy.