Exploring with Sticky Mittens: Reinforcement Learning with Expert Interventions via Option Templates

A framework for leveraging expert intervention to solve long-horizon reinforcement learning tasks.

Souradeep Dutta1, Kaustubh Sridhar1, Osbert Bastani1, Edgar Dobriban1, James Weimer1,2, Insup Lee1, Julia Parish-Morris3

1 University of Pennsylvania, 2 Vanderbilt University, 3 Children's Hospital of Pennsylvania

Conference on Robot Learning (CoRL) 2022
Auckland, New Zealand

Abstract

Long horizon robot learning tasks with sparse rewards pose a significant challenge for current reinforcement learning algorithms.

A key feature enabling humans to learn challenging control tasks is that they often receive expert intervention that enables them to understand the high-level structure of the task before mastering low-level control actions.

We propose a framework for leveraging expert intervention to solve long-horizon reinforcement learning tasks. We consider option templates, which are specifications encoding a potential option that can be trained using reinforcement learning. We formulate expert intervention as allowing the agent to execute option templates before learning an implementation. This enables them to use an option, before committing costly resources to learning it.

We evaluate our approach on three challenging reinforcement learning problems, showing that it outperforms state-of-the-art approaches by two orders of magnitude.

The Sticky Mittens Experiment

Child development researchers placed 'sticky mittens' (mittens with velcro) on infants too young to actually grasp objects. The mittens allowed the infants to snag Velcro-fitted toys merely by swiping at them. In comparisons with infants who hadn't used the mittens, found the psychologists, those who had used the mittens subsequently showed more sophisticated abilities to explore objects.

The sticky mittens experiment exposed infants to the consequences of grasping before they actually learned how to grasp!

Option Template: Sticky Mitten in Reinforcement Learning

Can a robot explore the consequences of a skill or option before learning its policy?

The option template is the option without its policy. Examples of option templates include (actual) sticky mittens, suboptimal (and simple to implement) controllers in real life, and other teleportation mechanisms in simulation.

Learning with Option Templates in Fetch and Stack

  1. At the highest level, learn a policy over `place block’ option templates.

  2. At the next level, learn a policy for each `place block’ option template over `reach block’, `pick and reach goal’, and `release block’ option templates.

  3. Repeat until primitive actions.

Notice that sticky mittens is top-down unlike traditional bottom-up hierarchical RL!


Three-order of magnitude improvement in sample-efficiency over bottom-up RL and baselines!

Learning with Option Templates in Google Research Football


  1. At the highest level, learn a winning policy over `defend’ and `attack and score goal' option templates.

  2. At the next level, learn a policy for `attack and score goal’ option template over `maintain ball possession’, `charge to the goal’, and `shoot’ option templates.

  3. Repeat until primitive actions.

Notice that sticky mittens is top-down unlike traditional bottom-up hierarchical RL!

Similar (easy) or better (medium, hard) performance at two orders of magnitude fewer steps.

Learning with Option Templates in Craft


Two orders of magnitude fewer steps!

Learning with option templates is top-down; Option-value iteration is bottom-up.


Top down learning outperforms bottom-up (option value iteration)!

Videos of Trained Agents

Fetch and Stack (3 blocks)

three_blocks_video.mp4

Fetch and Stack (4 blocks)

video_menuless_v1.mp4

Google Research Football (easy), 10 goals scored

easy_10_new_episode_done_20220213-045003633275.avi

Google Research Football (medium), 4 goals scored

medium_4_new_episode_done_20220212-213707655298.avi

Google Research Football (hard), 6 goals scored

hard_6_new_episode_done_20220213-042105122969.avi

Craft (get Gold)

getgold.mp4

Craft (get Gem)

getgem.mp4