Speakers

Moderator

Yael Niv

Princeton University

Yael Niv is a professor of neuroscience and psychology at Princeton University. Her lab studies the computational processes underlying reinforcement learning, focusing on how attention, memory and learning interact to construct task representations that allow efficient learning through optimal generalization. She is co-founder and co-director of the Rutgers-Princeton Center for Computational Cognitive Neuropsychiatry, where she is applying ideas from reinforcement learning to understanding and treating mental health. She is also the director of a Latent Cause Inference Conte Center joint between Princeton and Rutgers. Her proudest career accomplishment is winning a graduate mentoring award. In her nonexistent spare time, she is a mom to two awesome boys, and an activist within and outside academia.

Speakers

Ross Otto

McGill University

Using large naturalistic datasets to understand decision-making in the real world

The vast amounts of human-generated data available from the internet, government agencies, and other sources present exciting possibilities for understanding how people decide in the real world. In this talk I discuss two different lines of work which highlight how the analysis of large naturalistic datasets can be used to circumvent constraints of laboratory experimentation. For example, a recent body of work suggests that fluctuations in mood states are driven by unpredictable outcomes in daily life, but also appear to drive consequential behaviors such as risk-taking. By analyzing day-to-day mood language extracted from several million social media posts, we find that real-world ‘prediction errors’—local outcomes that deviate positively from expectations—predict day-to-day mood shifts observable at the level of a city, which in turn predicted increased per-person lottery gambling rates, elucidating the interplay between prediction errors, moods, and risk attitudes. In a newer line of work examining a massive dataset of UK grocery store purchases, we examine real-world evidence for decoy effects—whereby people’s choices between two valuable “target” options are swayed by a third, objectively inferior, “decoy” option— in consumer choice. When comparing products with attributes that effectively trade off (e.g., quality and price), the presence of inferior, decoy options systematically biased the likelihood of choosing a specific target option. This study provides a proof of principle demonstrating that these sorts of context effects are detectable in richer, complex real-world consumer choice settings with multiple decoy options, underscoring how real-world, naturalistic datasets afford unique understanding of consequential choice behaviors.

Sugandha Sharma

NVIDIA & Microsoft Research

Toward human-AI alignment in large-scale multi-player games

Achieving human-AI alignment in complex multi-agent games is crucial for creating trustworthy AI agents that enhance gameplay. We propose a method to evaluate this alignment using an interpretable task-sets framework, focusing on high-level behavioral tasks instead of low-level policies. Our approach has three components. First, we analyze extensive human gameplay data from Xbox's Bleeding Edge (100K+ games), uncovering behavioral patterns in a complex task space. This task space serves as a basis set for a behavior manifold capturing interpretable axes: fight-flight, explore-exploit, and solo-multi-agent. Second, we train an AI agent to play Bleeding Edge using a Generative Pretrained Causal Transformer and measure its behavior. Third, we project human and AI gameplay to the proposed behavior manifold to compare and contrast. This allows us to interpret differences in policy as higher-level behavioral concepts, e.g., we find that while human players exhibit variability in fight-flight and explore-exploit behavior, AI players tend towards uniformity. Furthermore, AI agents predominantly engage in solo play, while humans often engage in cooperative and competitive multi-agent patterns. These stark differences underscore the need for interpretable evaluation, design, and integration of AI in human-aligned applications. Our study advances the alignment discussion in AI and especially generative AI research, offering a measurable framework for interpretable human-agent alignment in multiplayer gaming.

Aaron Heller

University of Miami

Ecological drivers of emotion and their links to psychopathology

Prediction errors (PEs)—the difference between expected and actual outcomes—are thought to be critical drivers of learning and emotion, shaping momentary affective reactions, future expectations, and longer-term mood states. While theories of affect propose that emotions arise from outcome valence and PEs, most evidence comes from low-stakes laboratory studies, which may lack the ecological validity needed to capture the richness and complexity of real-world emotional experiences. To address this, we developed an event-triggered procedure to measure emotion and expectation updating in university students receiving goal-relevant exam grades—a high-stakes, personally meaningful event that elicits robust and sustained emotional responses. Using real-world contexts, we demonstrate that PEs are strong predictors of emotion and significantly influence long-term mood, with their effects persisting for up to two weeks. These findings underscore the importance of deviations from expectations in shaping affect in daily life and highlight the critical role of ecologically valid paradigms in understanding emotion.

Real-world, high-stakes events also provide a powerful framework for building computational models to understand how psychopathology affects emotional experience. In contrast to null results from some laboratory studies, we find that in real-world settings—where outcomes are deeply consequential and emotionally engaging—people with depression display a specific reduction in affective reactivity to positive PEs. Notably, there are no differences in responses to negative PEs or outcome values, implicating impaired positive PE processing as a key mechanism in emotional dysregulation. Together, these findings demonstrate that PEs are central to the dynamics of real-world emotion, linking momentary reactions to longer-term affective states. They also highlight the necessity of studying emotion in ecologically valid contexts to uncover meaningful insights into both typical and disordered emotional processing.

Björn Lindström

Karolinska Institutet

A computational reward-learning account of real-world social media engagement

Social media has become a central arena for human interaction, with billions of daily users worldwide. The intense popularity of social media is often attributed to a psychological need for social rewards (“likes”), portraying the online world as a Skinner Box for modern humans. Despite such portrayals, empirical evidence for social media engagement as reward-based behavior have been limited. In this talk, I will focus on a study where we applied a computational approach to test whether reinforcement learning mechanisms shape social media behavior in naturalistic settings. Analyzing over one million posts from more than 4,000 individuals across multiple platforms, we used reinforcement learning models to show that human behavior on social media conforms qualitatively and quantitatively to reward learning principles. Specifically, we found that users adaptively space their posts to maximize their rate of social rewards, balancing effort costs and opportunity costs. Furthermore, we identified meaningful individual differences in social reinforcement learning profiles, and through an online experiment, we provided causal evidence that social rewards directly influence behavior in line with our computational model. This work was among the first to establish large-scale, real-world evidence for reinforcement learning dynamics in social media engagement. I will also discuss some challenges of using naturalistic data for RL modeling and possible future directions for studying real-world sequential decision-making in dynamic social environments.

Data blitzes

Sahiti Chebolu

Max Planck Institute for Biological Cybernetics

People distribute or pace their work over time in rather diverse patterns and they frequently delay acting on tasks. In extreme cases they procrastinate, often at a cost to health and well-being. Although numerous factors influencing delay have been identified, along with various typologies of procrastination, mechanistic explanations are limited, typically explaining delays by appealing to excess or inconsistent temporal discounting of distant rewards and costs.

Inspired by principles of reinforcement learning and decision-making, our work explores various (sub-) optimal policies for decisions to delay across tasks with different structures – highlighting multiple types and mechanisms that could be involved. We integrate these in a common, systematic taxonomy of pacing and procrastination. We illustrate some of these mechanisms for delay through simulations of Markov decision process (MDP)-based computational models. Along with discounting of temporally distant rewards, these mechanisms include mis-estimation of one’s own efficacy, non-linear scaling of effort with amount of work, steeper discounting of effort than reward, and waiting for uncertain rewards.

We assess the operation of these mechanisms in a real-world task (using a rich dataset published by Zhang & Ma, Scientific Reports, 2024) to test if they can explain delays exhibited by students in the task. Model fitting revealed that all of our mechanisms are plausible candidates for what underlies students’ decisions to delay.

Our approach provides a theoretical foundation for understanding pacing and procrastination, enabling integration of both established and novel mechanisms within a unified conceptual framework.

David Cormack

Leonardo, Edinburgh

The academic community is moving at pace in developing new and exciting advances in Reinforcement Learning. These advances are likely to support real-world solutions in Autonomy and Decision Making, however these need to be applied carefully to meet safety considerations. This talk will cover aspects of why aerospace is moving from automated to more autonomous systems, an example use case, and poses some of the challenges of deploying Reinforcement Learning in the Wild.

Levi Solomyak

Hebrew University

Despite emotions' ubiquitous influence on learning and decision-making, studying these processes together has remained challenging, largely because emotions are considered internal experiences inaccessible to researchers. Recent work suggests that two classes of emotions map onto distinct reinforcement learning computations, providing a framework for studying emotional processes through observable behavior. In this framework, the key factor arbitrating between the two classes of emotions and computations is environmental controllability. In controllable environments, emotions evaluate actions, driving increased investment following negative outcomes to adjust policy. In uncontrollable environments, emotions track reward availability, with negative outcomes deflating reward-seeking motivation. After testing these predictions in a controlled experiment, we apply this framework to analyze professional tennis matches (N=6,715), which involve both intense emotional states and performance fluctuations. In line with the framework’s predictions, players’ responses to prediction errors depended on controllability: in game situations where performance changes have greater impact on winning, players increase the speed of their serves and their level of play following negative prediction errors, whereas in low controllability situations, these improvements follow positive prediction errors. These findings demonstrate how integrating emotions in reinforcement learning helps explain real-world learning and decision-making.

Organizers

Dan Mircea-Mirea

Princeton University

dmirea@princeton.edu; dan-mirea.com

I'm a PhD student at Princeton advised by Drs. Yael Niv and Erik Nook. My PhD is focused on identifying socio-cognitive mechanisms of mental health and its treatment from people's behavior in digital environments, such as social media or psychotherapy apps. To achieve this, I use a combination of statistical methods, computational cognitive modelling, and large-language models applied to real-world data. I am also doing experimental work on how latent-cause inference is atypical in various forms of psychopathology, and have previously worked on using large-language models for psychological text analysis. Before Princeton, I received a BA and an MSci in Natural Sciences from University of Cambridge. Aside from research, I enjoy learning languages and being chronically online (sometimes for educational purposes @danniesbrain).

Georgia Turner

University of Cambridge

gt342@cam.ac.uk

I am a third year PhD student in the Digital Mental Health Group, University of Cambridge, supervised by Dr Amy Orben. In my PhD I aim to understand why people feel they lose control of their technology use. To do so, I apply methods from computational neuroscience, such as Reinforcement Learning modelling, to real-world social media and smartphone data. Before my PhD, I completed my undergraduate degree also at the University of Cambridge, in Philosophy and Natural Sciences. I then studied Brain and Mind Sciences MSc at University College London, and Sorbonne in Paris. During my masters, I carried out research projects using neuroimaging (MEG) and computational modelling supervised by Drs. Peter Kok and Valentin Wyart. Outside my research, I enjoy drawing and sitting on the train between Cambridge and London.

Ana da Silva Pinho
University of Amsterdam

a.f.dasilvapinho@uva.nl

I am a postdoctoral researcher at the Connected Minds Lab, University of Amsterdam, advised by Wouter van den Bos. My research focuses on how young people engage with social media and how this relates to their mood and mental health. To investigate these dynamics, I use computational models, experimental designs, and integrate real-world and self-reported data. Before my postdoc, I was a PhD candidate at the same lab and university, studying social learning strategies during adolescence in the context of school-based social networks and social media, with a particular focus on social norms and feedback. I also worked as a data analyst at a social media agency for a year before returning to academia. Outside the lab, I enjoy photography, collaborating with visual artists on community-based science and art projects, and playing handball.

Page updated

Google Sites

Report abuse

Speakers

Yael Niv

Ross Otto

Sugandha Sharma

Aaron Heller

Björn Lindström

Sahiti Chebolu

David Cormack

Levi Solomyak

Dan Mircea-Mirea

Georgia Turner

Ana da Silva PinhoUniversity of Amsterdam

Ana da Silva Pinho
University of Amsterdam