What are goals, tasks and Rewards? where do they come from?

Arguably, the most important part in reinforcement learning is the reward. It implicitly specifies an environment task, which is assumed to be the goal of the agent. The reward for reinforcement learning is the primary learning signal for the agent. The cognitive science field has a rich history in studying how children learn. One hypothesis is children learn by playing, which results from their curiosity in surprising or ambiguous events. For example, if a five-month old child observes a book passing through a wall, this event might surprise them. They will hit the book and the wall to understand how such an event is possible. In other words, the child plays to find explanations for events inconsistent with their prior knowledge (i.e. it learns, “the child as the scientist”). There also exists literature that views child’s play not as goal-oriented exploration, rather near random exploration whereby they continuously discover new problems and new solutions. The machine learning field has focused on similar ideas of curiosity and intrinsic motivation for artificial agents for efficient exploration e.g. when environments have sparse rewards. Recent formulations can be grouped into a) measuring novelty and b) predicting error of environment dynamics. Similar to a child’s ability of autonomously creating new problems for itself is curriculum learning. Past approaches parameterize and propose new environments, goals, initial states and reward functions over training.

What role does attention or memory play in long-term decision making?

A continual RL agent must continually extend its abilities and adapt them to a continuously changing environment, a trademark of natural intelligence. In doing so, there is a natural tension between learning new rules from more recent data versus remembering the old learning from previously seen data commonly known as the stability-plasticity dilemma. Long-term decision making over the lifetime of an agent is challenging because learning new tasks might result in catastrophic forgetting of previous tasks. Moreover, the rewards may be sparse and delayed over long horizons of a lifetime. Humans often replay events from the past in the form of stories, thoughts or dreams which allows them with a computational ability to adjust how they act in the future. As a result, assignment of improper credit to specific features or events could lead to errors in downstream tasks. Much of our hypotheses on the role of attention and memory in human learning have served as a stepping stone to new frontiers in reinforcement learning. Findings from human psychology on conditioning, synaptic consolidation, blocked or interleaved learning, learning at multiple time scales, interplay of forgetting and time of memory acquisition, selective feature selection, sparsity in attention, top-down or bottom-up attention mechanisms, etc. are vital to better understand how we can build generalized and robust continual RL agents.

How to learn and leverage reusable representations over the course of an agent’s lifetime?

The question in continual RL often focuses on asking what should be transferred to new tasks across the lifetime of an agent. One solution could be to save all the experiences the agent encounters as it goes through its lifetime. Unfortunately, this is not plausible and often not optimal either. Hence, we would like to learn reusable representations, which implies decomposing the agent’s representations such that they can be repurposed for different downstream tasks. These reusable representations can actually be leveraged at many different stages of decision making: they can span perception, task structure and behaviour. Take a simple task of preparing tea for a human: one can recognize different types of kettles easily, even if they are novel; one can adapt to different target mugs and the goal of getting a container full of tea can be generalized to serving coffee; finally, the act of pouring is highly adaptable, the motor skills required can be flexibly reused and generalized to significantly different tasks such as watering plants. To achieve maximal representational reuse for solving a variety of tasks continually, compositionality and abstraction are crucial, and being able to extract structure and learn efficient reuse will be key to building agents that can cope with the complexity and variability of the real-world, much like biological agents. This can be achieved through designing environments for our agents that exhibit enough variability and requirement for explicit reuse of representations mimicking the complexity of the world that a biological agent would encounter.