Imitation Learning and its Challenges in Robotics

NeurIPS workshop | Montreal, Canada | Dec 7, 2018 | 516 CDE

Description

Many animals including humans have the ability to acquire skills, knowledge, and social cues from a very young age. This ability to imitate by learning from demonstrations has inspired research across many disciplines like anthropology, neuroscience, psychology, and artificial intelligence. In AI, imitation learning (IL) serves as an essential tool for learning skills that are difficult to program by hand. The applicability of IL to robotics in particular, is useful when learning by trial and error (reinforcement learning) can be hazardous in the real world. Despite the many recent breakthroughs in IL, in the context of robotics there are several challenges to be addressed if robots are to operate freely and interact with humans in the real world.

Some important challenges include: 1) achieving good generalization and sample efficiency when the user can only provide a limited number of demonstrations with little to no feedback; 2) learning safe behaviors in human environments that require the least user intervention in terms of safety overrides without being overly conservative; and 3) leveraging data from multiple sources, including non-human sources, since limitations in hardware interfaces can often lead to poor quality demonstrations.

In this workshop, we aim to bring together researchers and experts in robotics, imitation and reinforcement learning, deep learning, and human robot interaction to

Formalize the representations and primary challenges in IL as they pertain to robotics
Delineate the key strengths and limitations of existing approaches with respect to these challenges
Establish common baselines, metrics, and benchmarks, and identify open questions

Invited Speakers

Peter Stone

UT Austin

Sonia Chernova

Georgia Tech

Ingmar Posner

Oxford University

Yisong Yue

Caltech

Byron Boots

Georgia Tech / NVIDIA

Anca Dragan

Berkley / Waymo

Dorsa Sadigh

Stanford

Drew Bagnell

Aurora

Important Dates

Oct 19

Oct 29

Nov 16

Dec 7

Submission deadline (AoE time)

Notification of acceptance

Camera ready deadline

Workshop

Call for Abstracts

We solicit up to 4 pages extended abstracts (excluding references) conforming to the NeurIPS style. Submissions can include archived or previously accepted work (please make a note of this in the submission). Reviewing will be single blind.

Submission link: https://easychair.org/conferences/?conf=nips18ilr

Topics of interest include, but are not limited to:

Sample efficiency in imitation learning
Learning from high dimensional demonstrations
Learning from observations
Learning with minimal demonstrator effort
Few shot imitation learning
Risk aware imitation learning
Learning to gain user trust
Learning from multi modal demonstrations
Learning with imperfect demonstrations

All accepted contributions will be presented in interactive poster sessions. A subset of accepted contributions will be featured in the workshop as spotlight presentations.

Travel Awards

With the generous support of our sponsors, we are excited to offer a few travel awards intended to partly offset cost of attendance (registration + most of travel). Only presenting students/post-docs of accepted contributions will be eligible to receive these awards. Applications will be accepted alongside submissions.

Winners:

Aadil Hayat, Multi-Task Learning Using Conditional Generative Adversarial Imitation Learning
Daphne Chen, A Large-Scale Benchmark Study Investigating the Impact of User Experience, Task Complexity, and Start Configuration on Robot Skill Learning

Runner-ups:

Pim de Haan, Causal Confusion in Imitation Learning
Hongyu Ren, Stabilizing Reinforcement Learning via Mutual Imitation

Schedule

08:55 - 09:00 | Organizers | Introduction

09:00 - 09:30 | Peter Stone | Control Algorithms for Imitation Learning from Observation

09:30 - 10:00 | Sonia Chernova | Learning Generalizable Robot Skills

10:00 - 10:15 | Contributed Spotlights | #1 to #5

10:15 - 11:00 | Poster Session I and Coffee Break | #1 to #20

11:00 - 11:30 | Ingmar Posner | Watch and learn - quickly

11:30 - 12:00 | Dorsa Sadigh | Active Learning of Humans' Preferences

12:00 - 02:00 | Lunch Break

02:00 - 02:30 | Byron Boots | Imitation as Acceleration

02:30 - 02:45 | Dileep George | Industry Spotlight | Beyond imitation: Zero-shot task transfer on robots by learning concepts as cognitive programs

02:45 - 03:30 | Poster Session II and Coffee Break | #1 to #20

03:30 - 04:00 | Yisong Yue | Structured Imitation & Reinforcement Learning

04:00 - 04:30 | Anca Dragan | An Interaction View on Inverse RL

04:30 - 05:00 | Drew Bagnell / Wen Sun | Learning with Limited Experts

05:00 - 05:30 | Panel Discussion | All speakers

Talk Abstracts

09:00 - 09:30 | Peter Stone | Control Algorithms for Imitation Learning from Observation

Imitation learning is a paradigm that enables autonomous agents to capture behaviors that are demonstrated by people or other agents. Effective approaches, such as Behavioral Cloning and Inverse Reinforcement Learning, tend to rely on the learning agent being aware of the low-level actions being demonstrated. However, in many cases, such as videos or demonstrations from people (or any agent with a different morphology), the learning agent only has access to observed state transitions. This talk introduces two novel control algorithms for imitation learning from observation: Behavioral Cloning from Observation (BCO) and Generative Adversarial Imitation from Observation (GAIfO).

11:00 - 11:30 | Ingmar Posner | Watch and learn - quickly

Data efficiency while learning robust policies remains a key requirements in imitation learning. The consensus is still out as to how data efficiency is best achieved. Invariably, it involves leveraging domain knowledge in some form or another. This talk provides two examples of learning complex tasks from demonstration. First we show how demonstrations can be leveraged to fine-tune the behaviour of a planner in a real-world autonomous driving setting. We then describe how jointly learning the segmentation and the corresponding sub-tasks given a hierarchical task description and a number of demonstrations leads to more robust, re-usable sub-policies.

11:30 - 12:00 | Dorsa Sadigh | Active Learning of Humans' Preferences

Today’s society is rapidly advancing towards robotics systems that interact and collaborate with humans, e.g., semi-autonomous vehicles interacting with drivers and pedestrians, medical robots used in collaboration with doctors, or service robots interacting with their users in smart homes. In this talk, I will discuss our work on efficiently and actively learning predictive models of humans’ preferences by eliciting comparisons from a mixed set of humans. I will then focus on a batch active method that trades off between the number of queries and query generation time for learning such reward functions.

02:30 - 02:45 | Dileep George | Industry Spotlight | Beyond imitation: Zero-shot task transfer on robots by learning concepts as cognitive programs

Humans can infer concepts from image pairs and apply those in the physical world in a completely different setting, enabling tasks like IKEA assembly from diagrams. If robots could represent and infer high-level concepts, it would significantly improve their ability to understand our intent and to transfer tasks between different environments. To that end, we introduce a computational model that replicates aspects of human concept learning. Concepts are represented as programs on a novel computer architecture consisting of a visual perception system, working memory, and action controller. The instruction set of this ‘cognitive computer’ has commands for parsing a visual scene, directing gaze and attention, imagining new objects, manipulating the contents of a visual working memory, and for controlling arm movement. Inferring a concept corresponds to inducing a program that can transform the input to the output. Some concepts require the use of visual imagination and recursion. Previously learned concepts simplify the learning of subsequent more elaborate concepts, and create a hierarchy of abstractions. We demonstrate how a robot can use these abstractions to interpret novel concepts presented to it as schematic images, and then apply those concepts in dramatically different situations. By bringing cognitive science ideas on image schemas, perceptual symbols, embodied cognition, and deictic mechanisms into the realm of machine learning, our work brings us closer to the goal of building robots that have interpretable representations and conceptual understanding.

03:30 - 04:00 | Yisong Yue | Structured Imitation & Reinforcement Learning

In machine learning, there are two broad ways that one can provide domain knowledge. The first is by providing training examples, and the second is by constructing useful priors. In this talk, I will present recent work on incorporating structured prior knowledge (e.g., from control theory) into the learning process in order to dramatically improve computational and statistical efficiency, and in some cases provide side guarantees such as stability or safety. Come for the talk, stay for the demo videos.

04:30 - 05:00 | Drew Bagnell / Wen Sun | Learning with Limited Experts

Imitation learning has been widely used in real world applications and is often tremendously more sample efficient than Reinforcement Learning. We being by reviewing the assumptions and resulting guarantees for exiting approaches to reinforcement learning. We then consider extensions where we have weaker access to an expert or the expert themselves is imperfect. In particular, in this talk, we consider two cases of imitation learning from limited experts: (1) imitating an expert who can only provide demonstrations consisting of observations (i.e., imitation learning from observation alone), and (2) imitating locally optimal experts.

For the first setting, we present an algorithm— Forward Adversarial Imitation Learning (FAIL), that reduces imitation learning into a set of independent two-player min-max games which can be solved efficiently using standard no-regret online learning. FAIL provides a near-optimal performance guarantee with supervised learning-type sample complexity, i.e. , a polynomial dependency on the statistical complexity of function approximators rather than the cardinality of observation space. For the second setting, we generalize classic iterative learning control via incorporating model learning and closed-loop control. The proposed framework, Dual Policy Iteration (DPI), can be understood as imitation learning from local-optimal experts. Our formulation and analysis also shed light on the convergence of the popular AlphaGo-Zero algorithm.

Contributed Papers

1. Sam Zeng, Vaibhav Viswanathan, Cherie Ho and Sebastian Scherer. Learning Reactive Flight Control Policies: From LIDAR Measurements to Actions

2. Muhammad Asif Rana, Daphne Chen, Reza Ahmadzadeh, Jake Williams, Vivian Chu and Sonia Chernova. A Large-Scale Benchmark Study Investigating the Impact of User Experience, Task Complexity, and Start Configuration on Robot Skill Learning

3. Dequan Wang, Coline Devin, Qi-Zhi Cai, Philipp Krähenbühl and Trevor Darrell. Learning to Drive with Monocular Plan View

4. Pim de Haan, Dinesh Jayaraman and Sergey Levine. Causal Confusion in Imitation Learning

5. Wen Sun, Hanzhang Hu, Byron Boots and Drew Bagnell. Provably Efficient Imitation Learning from Observation Alone

6. Laurent George, Thibault Buhet, Emilie Wirbel, Gaetan Le-Gall and Xavier Perrotton. Imitation Learning for End to End Vehicle Longitudinal Control with Forward Camera

7. Ibrahim Sobh and Nevin Darwish. End-to-End Framework for Fast Learning Asynchronous Agents

8. Konrad Zolna, Negar Rostamzadeh, Yoshua Bengio, Sungjin Ahn and Pedro O. Pinheiro. Reinforced Imitation Learning from Observations

9. Nicholas Rhinehart, Rowan McAllister and Sergey Levine. Deep Imitative Models for Flexible Inference, Planning, and Control

10. Mohit Sharma, Arjun Sharma, Nicholas Rhinehart and Kris Kitani. Learning Hierarchical Policies from Unsegmented Demonstrations using Directed Information

11. Bin Wang, Qiyuan Zhang, Yuzheng Zhuang, Jun Luo, Hongbo Zhang and Wulong Liu. Data-efficient Imitation of Driving Behavior with Generative Adversarial Networks

12. Alex Bewley, Jessica Rigley, Yuxuan Liu, Jeffrey Hawke, Richard Shen, Vinh-Dieu Lam and Alex Kendall. Zero-Shot Driving Imitation via Image Translation

13. Sanjay Thakur, Herke Van Hoof, Kushal Arora, Doina Precup and David Meger. Sample Efficient Learning From Demonstrations on Multiple Tasks using Bayesian Neural Networks

14. Lionel Blondé and Alexandros Kalousis. Sample-Efficient Imitation Learning via Generative Adversarial Nets

15. Aadil Hayat, Sarthak Mittal and Vinay Namboodiri. Multi-Task Learning Using Conditional Generative Adversarial Imitation Learning

16. Ozgur S. Oguz, Ben Pfirrmann, Mingpan Guo and Dirk Wollherr. Learning Hand Movement Interaction Control Using RNNs: From HHI to HRI

17. Michael Kelly, Chelsea Sidrane, Katherine Driggs-Campbell and Mykel J. Kochenderfer. Safe Interactive Imitation Learning from Humans

18. Sujoy Paul and Jeroen Vanbaar. Trajectory-based Learning for Ball-in-Maze Games

19. Michał Garmulewicz, Henryk Michalewski and Piotr Miłoś. Expert-augmented actor-critic for ViZDoom and Montezuma’s Revenge

20. Hongyu Ren, Jiaming Song and Stefano Ermon. Stabilizing Reinforcement Learning via Mutual Imitation