Schedule
Schedule
09:00 - 09:10 (10 minutes) - Opening & Welcome Remarks
09:10 - 09:30 (20 minutes) - Invited Talk: Julia Haas (Google Deepmind).
09:30 - 09:50 (20 minutes) - Invited Talk: Jeff Cockburn (University of Iowa).
09:50 - 10:10 (20 minutes) - Invited Talk: Kelly W. Zhang (Imperial College London).
10:10 - 10:20 (10 minutes) - Break
10:20 - 10:40 (20 minutes) - Invited Talk: Cameron Allen (UC Berkeley).
10:40 - 11:00 (20 minutes) -Invited Talk: Nathaniel Daw (Princeton).
11:00 - 11:30 (30 minutes) - Coffee Break
11:30 - 12:00 (30 minutes) - Breakout Discussions
12:00 - 13:00 (1 hour) - Panel Discussion w/ Invited Speakers (moderated by Michael Dennis)
Abstracts
Julia Haas
Revising the Phenomena of Minds: An RL Framework for Moral Cognition
I present an RL-based framework for analyzing human moral cognition. The approach not only saves key phenomena—such as multi-dimensionality, context-sensitivity, and cross-cultural variability —but also revises certain aspects of our moral psychological experience. For example, it recasts the motivational force of an "ought" as a function of its valuational strength, suggesting that the distinction between social and moral values may be one of degree, not of kind. I conclude by discussing some normative implications of the view, including by looking at what all this might mean for evaluating for moral competence in contemporary LLMs.
Jeffrey Cockburn
The State of Reinforcement Learning
Reinforcement learning (RL) provides a principled computational framework for understanding how agents, including humans, learn from experience to guide decision-making. Yet a foundational and often underappreciated challenge in applying RL models to the study of mind and behavior lies in specifying the task representation, that is, how the environment is structured in the mind of the learner. This problem becomes particularly acute when studying behavior in complex, ecologically valid settings or in clinical populations where representational assumptions may diverge significantly from normative models. In this talk, I will present three lines of recent work that confront this challenge. First, I will illustrate how deep-RL methods can shed light on the computational principles that support flexible and generalizable behavior. Second, I will show how large-scale behavioral data can be stratified according to individual differences that reflect meaningful variability in representational and algorithmic strategies. Third, I will discuss evidence that humans may simultaneously maintain and arbitrate between multiple task representations to guide action. Together, these studies underscore the importance of modeling not just the learning algorithms, but also the representational substrates over which they operate. Refining our assumptions about these latent structures will, I argue, be essential for advancing our understanding of how reinforcement learning is realized in the human brain and mind.
Kelly W. Zhang
Impatient Bandits: Optimizing for the Long-Term Without Delay
Increasingly, recommender systems are tasked with improving users' long-term satisfaction. In this context, we study a content exploration task, which we formalize as a bandit problem with delayed rewards. There is an apparent trade-off in choosing the learning signal: waiting for the full reward to become available might take several weeks, slowing the rate of learning, whereas using short-term proxy rewards reflects the actual long-term goal only imperfectly. First, we develop a predictive model of delayed rewards that incorporates all information obtained to date. Rewards as well as shorter-term surrogate outcomes are combined through a Bayesian filter to obtain a probabilistic belief. Second, we devise a bandit algorithm that quickly learns to identify content aligned with long-term success using this new predictive model. We prove a regret bound for our algorithm that depends on the Value of Progressive Feedback, an information theoretic metric that captures the quality of short-term leading indicators that are observed prior to the long-term reward. We apply our approach to a podcast recommendation problem, where we seek to recommend shows that users engage with repeatedly over two months. We empirically validate that our approach significantly outperforms methods that optimize for short-term proxies or rely solely on delayed rewards, as demonstrated by an A/B test in a recommendation system that serves hundreds of millions of users.
Cameron Allen
The Agent Must Choose the Problem Model
Reinforcement learning agents have it easy. Their problem model comes pre-specified from the first time step of their deployment. Observations, actions, rewards—even the learning algorithm—are pre-arranged, expert-designed, and hand-tuned to help the agent accomplish its task. But models can be wrong. What if the agent's problem model turns out to be inadequate? No one is coming to help; autonomous agents must adapt. How can we build agents that handle such daunting ambiguity? I'll present some initial progress in that direction: an agent that can detect when its observations are incomplete and learn a memory function to compensate. The result is a first step towards agents that choose their own problem models.
Nathaniel Daw
RL and the phenomena of psychiatry, theory and practice. There has been a great deal of interest in the idea that RL concepts and mechanisms can explain the phenomena and mechanisms of valuation and emotion: e.g., deliberation vs experience and their roles in constructing value. Building on this is the more applied theoretical idea that dysfunction in these mechanisms might subserve and explain psychiatric symptoms, such as compulsion: and in particular, that these mechanisms have the ability to explain the relationship between behavioral symptoms (like avoidance) and subjective symptoms (like worry). Building one more step is the more empirical idea that behavior on RL tasks designed to measure individual differences in key choice mechanisms (like deliberation, exploration, or avoidance) might help to substantiate this identification. I report the results from a large study relating individual differences in symptoms and RL behavior that suggest the practical situation does not yet live up to the theoretical promise.