Maps in reinforcement learning:
efficient representations of structure and time in decision-making
Workshop at RLDM 22
Workshop at RLDM 22
Biological and artificial agents store information from past experiences and re-use this knowledge in often novel situations. To do so efficiently, agents should generalise across high-dimensional and dynamic experiences by exploiting their regularities. Making use of such representations to achieve goals is a key focus of reinforcement learning (RL). Representations are shaped by past experiences, and at the same time constraint what computations can be used for decision making.
Decades of research, starting with studies in spatial navigation, show that generalisable aspects of experience can be organised efficiently within cognitive maps. A key problem in the study of representations is to extend static analysis of cognitive mapping to dynamic cognition where memories in time and space are used for efficient decision making. Successor representations were adapted from machine RL to re-interpret hippocampal neural and behavioural data within a predictive framework, and were extended to study more general cognitive tasks. Other approaches such as factorised, distributional, and graph-based representations aim at capturing statistical structure of feature spaces, and have narrowed down important components that cognitive maps should incorporate in order to efficiently compress the structure of experience. In parallel, research on episodic memory has shed light on how brains organise dynamic information efficiently, and elucidated mechanisms of efficient storage of past memories and their use for future predictions. These approaches open doors for future developments concerning conjunctive encoding of time and relevant common features of events, aiming to shed light on how brains extract and store relevant information from experience, represent their dynamics efficiently, and use these for prediction. Furthermore, it is crucial to understand how existing representations can be learned, stored, and scaled up to general cognitive problems.
In this workshop, we propose talks by researchers from diverse and interdisciplinary backgrounds that are pioneering new directions in representation and reinforcement learning as well as their interplay. We will investigate how these approaches relate to and inform each other, their points of connection and diverging predictions, in order to point towards synergies and open challenges.
Schedule
First Session: 1.00pm - 2.15pm
1:00 Intro
1:05 Kim Stachenfeld
1:30 Will Dabney
1:55 Daniel McNamee
10 min break: 2:20pm - 2:30pm break
Second Session: 2.30pm - 3.35pm
2:30 Angela Radulescu
2:55 Lucas Lehnert
3:20 Lucy Lai
10 min break: 3:45pm-4:55pm break
Third Session: 3.55pm - 5.10pm
3:55 James Whittington
4:20 Marc Howard
4:45 Mike Hasselmo
5:10 end of the workshop - short wrap up
Abstracts and presenters
Kim Stachenfeld, DeepMind
Learned graph-based simulators for physical design
Designing physical objects that maximize some rewarding objective is central to engineering as well as everyday human behavior. Automating design using machine learning has tremendous promise; however, existing methods are often limited in their ability to generalize beyond the designs, dynamics, and tasks encountered during training. I will discuss recent work exploring how graph-based learned simulators can be used with gradient-based optimization for task-agnostic design. This constitutes a simple, fast, and reusable approach that solves high-dimensional problems with complex physical dynamics, including designing surfaces and tools to manipulate fluid flows and optimizing the shape of an airfoil to minimize drag. The framework produces high-quality designs by propagating gradients through trajectories of hundreds of steps, even when using models that were pre-trained for single-step predictions on data substantially different from the design tasks. To close, I will discuss some surprising findings about when this approach works and fails, as well as future directions we are excited about for exploring complex problems in design.
Lucas Lehnert, MILA
Transferring Task Knowledge with Reward-Predictive Representations
While recent advances in deep reinforcement learning research have demonstrated impressive performance in solving complex tasks, achieving flexible knowledge reuse—a characteristic of human behaviour—remains elusive. Knowledge reuse allows intelligent systems to incorporate already known solutions of previous tasks, resulting in faster learning of solutions for new complex tasks. Therefore, efficient knowledge re-use is a central, yet not well understood challenge in artificial intelligence research. In this talk I will approach this question through the lens of representation learning, a technique where intelligent systems construct an internal representation of their inputs to facilitate learning. Through a sequence of theoretical and empirical results, I will discuss different representation systems and how they relate to model-based and model-free reinforcement learning. Furthermore, I will present a clustering algorithm that constructs a representation that is predictive of future reward sequences for tasks with complex and high-dimensional inputs. Using this algorithm, I will demonstrate that the resulting reward-predictive representation can be re-used across different tasks to accelerate learning and reduce the number of data points needed to obtain an optimal solution. This representation reuse constitutes a form of efficient task knowledge transfer not provided by other reinforcement learning algorithms.
Angela Radulescu, NYU
Towards naturalistic reinforcement learning in health and disease
Adaptive decision-making relies on our ability to organize experience into useful representations of the environment. This ability is critical in the real world: each person’s experience is dynamic and continuous, and no two situations we encounter are exactly the same. In this talk, I will present results from ongoing work attempting to understand how representation learning can take place in naturalistic environments. One line of work leverages virtual reality in combination with eye-tracking to study what features of naturalistic scenes guide goal-directed search. A second study examines the role of language in providing a prior for which features are relevant for decision-making. I will conclude with a discussion of the potential of naturalistic reinforcement learning as a model of mental health dynamics.
Will Dabney, DeepMind
All Representations Are Wrong. What Makes Them Useful?
There are an endless array of approaches to representation learning, choices for objectives to optimize, and desirable properties for representations in reinforcement learning. However, these tend to focus on an imagined 'end point', an optimal solution, that never arrives. In this talk, I will argue that the story of representation learning in reinforcement learning is less about the end goal and more about the on-going, dynamic interaction between an agent and its representation. I will highlight surprising successes and unexpected challenges encountered in studying this interaction.
Lucy Lai, Harvard University
Policy compression: an information bottleneck in action selection
The brain has evolved to produce a diversity of behaviors under stringent computational resource constraints. Given this limited capacity, how do biological agents balance reward maximization against the costs of storing complex action policies? In this talk, I will overview the theoretical and experimental evidence for this reward-complexity trade-off. Using the theoretical framework of policy compression, or the reduction in cognitive cost of representing action policies by making them simpler, I will illustrate how a wide range of behavioral phenomena, including stochasticity, perseveration, response time, and chunking are brought together under the same resource-rational framework. Finally, I will discuss how our model could be used to probe behavioral deficits in psychiatric illness.
Daniel McNamee, Champalimaud, Lisbon
A deterministic global optimization perspective on planning and representation
Solving Markov Decision Processes with model-based algorithms usually relies on stochastic algorithms. For example, Monte Carlo estimates of policy gradients or Q-value backups are derived from trajectory simulations. What if an agent could run an infinite number of simulations at every iteration? I'll consider such an asymptotically optimal algorithm as a normative process theory of planning and study it under the assumption that a veridical world model is available. This theory specifies an integrative dynamic interplay between state representation strategies and policy updates, and I'll suggest that various manifestations of human cognitive processing are emergent in the policy optimization process e.g. abstractions, parallelization, hierarchical reasoning, and means-end analysis.
James Whittington, University of Oxford, Stanford University
Relating transformers to models and neural representations of the hippocampal formation
Many deep neural network architectures loosely based on brain networks have recently been shown to replicate neural firing patterns observed in the brain. One of the most exciting and promising novel architectures, the Transformer neural network, was developed without the brain in mind. In this work, we show that transformers, when equipped with recurrent position encodings, replicate the precisely tuned spatial representations of the hippocampal formation; most notably place and grid cells. Furthermore, we show that this result is no surprise since it is closely related to current hippocampal models from neuroscience. We additionally show the transformer version offers dramatic performance gains over the neuroscience version. This work continues to bind computations of artificial and brain networks, offers a novel understanding of the hippocampal-cortical interaction, and suggests how wider cortical areas may perform complex tasks beyond current neuroscience models such as language comprehension.
Marc Howard, Boston University, Boston, Massachusset
Temporal maps of the past and the future
We experience a continuous flow of time, with the past trailing behind and the future extending before us. In contrast, traditional RL algorithms grow out of associative memory models that are atemporal. A large and growing body of neuroscientific work indicates that the activity across populations of neurons in several regions carry a temporal memory for what happened when in the past. In the entorhinal cortex, neurons have exponential basis functions over the past with a distribution of time constants. We can identify this population with an ensemble of eligibility traces with a distribution of forgetting rates. We can identify the activity of this population with the real Laplace transform of the past leading up to the present as a function of time. That is, the population captures the temporal \emph{relationships} between prior events and the present rather than simply the strength of a memory trace. Given the Laplace transform of the past, it is straightforward to construct an associative memory model that produces a continuous estimate of the future at each moment in time. This predictive relational memory could be incorporated into a new generation of RL models that naturally capture and express temporal relationships in an organic way.
Michael Hasselmo, Boston University, Boston, Massachusset
Coding of space and time in cortical structures
Recordings of neurons in cortical structures in behaving rodents show responses relevant to encoding of space and time for goal directed behavior. Spatial location is coded by grid cells in entorhinal cortex and place cells in hippocampus (O’Keefe, 1976; Hafting et al., 2005). Grid cells and place cells can also code temporal intervals in a behavioral task, firing at specific time intervals or running distances when a rat runs on a treadmill (Kraus et al., 2013; 2015; Mau et al., 2018). Modeling shows that coding as time cells may arise from exponential decay of neural activity on multiple time scales (Liu et al., 2019). Coding of space could involve both path integration and transformation of sensory input. Coding of location by path integration could involve coding of running speed (Hinman et al., 2016; Dannenberg et al. 2020). Inactivation of input from the medial septum impairs the spatial selectivity of grid cells suggesting rhythmic coding of running speed is essential to grid cell firing (Brandon et al., 2011). Transformation of sensory input may be even more important for computing location. Recent data from our lab shows coding of environmental boundaries in egocentric coordinates (Hinman et al., 2019; Alexander et al., 2020) that could be combined with head direction to generate allocentric coding of boundaries and spatial location (Bicanski and Burgess, 2018). These different neural mechanisms could mediate the state coding of time and spatial location for agents to guide action selection in a range of behavioral tasks.
Organisers
Charline Tessereau
postdoctoral fellow
charline.tessereau@tuebingen.mpg.de
https://www.kyb.tuebingen.mpg.de/person/114454/2549
Mihály Bányai
postdoctoral fellow
mihaly.banyai@tuebingen.mpg.de
https://www.kyb.tuebingen.mpg.de/person/103805/2549
Philipp Schwartenbeck
postdoctoral fellow
philipp.schwartenbeck@tuebingen.mpg.de
https://www.kyb.tuebingen.mpg.de/person/111269/2549
Department of Computational Neuroscience
Max Planck Institute for Biological Cybernetics
Tübingen, Germany