[Image: Kalakrishnan 13]
As technology in autonomous robotics continues to evolve, so does the complexity of the decision problems that we expect our systems to solve. The resulting action policies range from low-level control of forces to high-level selection of complex strategies. These decision problems are often straightforward for humans while they remain difficult for standard robotics approaches. In this context, Learning from Demonstrations (LfD) can reduce the difficulty of defining action policies by providing expert knowledge in the form of examples of near-optimal behaviors. Understanding and formalizing LfD has been the topic of many fields of science including robotics, neuroscience, cognitive science, psychology and anthropology. However, many LfD problems are still intractable due to the embedding of exceedingly high-dimensional representations that stem from their coupling to high-dimensional observation spaces (e.g., visual, haptic). 

In this workshop, we aim to bring together experts in robotics, machine learning and cognitive science to discuss the state-of-the-art in LfD and explore promising directions for handling high-dimensional feature, state and observation spaces, looking beyond traditional approaches to find connections from vision, deep learning and human models of cognition. We hope to highlight recent applications and identify tools and techniques that can enable us to scale our methods to handle high-dimensional demonstrations. From this workshop, we expect participating researchers to identify and address important challenges, techniques, and benchmarks necessary for LfD in high-dimensional feature spaces.

Venue: Building 32: Stata Center, Room 123 (32-123)

Important Dates:
  • Submission deadline: 8 June 2017
  • Notification: 15 June 2017
  • Camera ready: 3 July 2017
  • Workshop: 16 July 2017

Invited speakers:


8:55 - 9:00



9:00 - 9:30

Andrew Bagnell

Deep Combinations of Imitation and Reinforcement

9:30 - 10:00

Jan Peters

Integrating Dimensionality Reduction into Reinforcement Learning

10:00 - 10:30

Marc Toussaint

We Should Think More About Higher Level Behavior

10:30 - 11:00

Posters / Coffee

Coffee break / Poster presentations

11:00 - 11:30

Pieter Abbeel

Learning to Learn to Act

11:30 - 12:00

Stefano Ermon

Generative Adversarial Imitation Learning

12:00 - 2:00



2:00 - 2:30

Michael Laskey / Ken Goldberg

Pixels-to-Policies from Fallible Human Demonstrations

2:30 - 3:00

Jon Scholz

Combining Rewards and Demonstrations for Real-World Control Problems

3:00 - 4:00

Posters / Coffee

Coffee break / Poster presentations

4:00 - 4:30

Josh Tenenbaum

Reverse Engineering Human Intuitive Cognition

4:30 - 5:00

Sam Gershman

To learn from humans, learn like humans

5:00 - 5:30

Speaker Panel

Panel: Sergey Levine, Marc Toussaint, Anca Dragan, Sam Gershman, Josh Tenenbaum, Chelsea Finn, Michael Laskey

Accepted Papers:
Talk abstracts:
  • 09:00 - 09:30 AM, Andrew Bagnell, Deep Combinations of Imitation and Reinforcement
    Abstract: Tremendous learning results have been achieved by combining three pieces: imitation of expert demonstration, reinforcement signals, and powerful (e.g. modern neural) learning architectures.
    Here we consider this combined imitation/reinforcement setting and simple algorithms— the AggreVaTe family-- for that setting with formal guarantees and strong empirical performance.  
    We begin by analyzing why one would consider imitation in the presence of a reward signal: intuitively learning can be much faster and a formal analysis demonstrates we can expect up to exponentially lower sample complexity for learning with AggreVaTe than with "pure" RL algorithms.  
    Next we consider recent state-of-the-art performances in sequential decision making problems (e.g., robotics control, sequential prediction) demonstrated by deep neural network models. One often has access to near-optimal oracles (“teachers”) that achieve good performance on the task during training, and is important to take advantages of these.  
    We then present AggreVaTeD — a policy gradient extension of the Imitation Learning (IL) approach of (Ross & Bagnell, 2014) — that can leverage expert demonstration to achieve faster and better solutions with less training data than a less-informed Reinforcement Learning (RL) technique. Using both feedforward and recurrent neural predictors, we present stochastic gradient procedures on a sequential prediction task, dependency-parsing from raw image data, as well as on various high dimensional robotics control problems.  Our results and theory indicate that the proposed approach can even out perform a suboptimal teacher.  
    This talk includes joint work with Stephane Ross, Wen Sun, Arun Venkatraman, Byron Boots,  and Geoff Gordon.
  • 09:30 - 10:00 AM, Jan Peters, Integrating Dimensionality Reduction into Reinforcement Learning
  • 10:00 - 10:30 AM, Marc Toussaint, We should think more about higher level behavior
    • Abstract: I will first talk about IOC to learn constrained objective functions from demonstration (inverse KKT), and combining optimization and reinforcement learning (CORL) to explore and generalize from a single manipulation demonstration. However, my actual motivation in such methods is their integration in learning higher level behavior. I will discuss our work to represent, and learn from demonstration, cooperative multi-agent manipulation policies, and the challenges open in this context.

  • 11:00 - 11:30 AM, Pieter Abbeel, Learning to Learn to Act
    • Abstract: Reinforcement learning and imitation learning have seen success in many domains, including autonomous helicopter flight, Atari, simulated locomotion, Go, robotic manipulation.  However, sample complexity of these methods remains very high. In contrast, humans can pick up new skills far more quickly.  To do so, humans might rely on a better learning algorithm or on a better prior (potentially learned from past experience), and likely on both. In this talk I will describe some recent work on meta-learning for action, where agents learn the imitation/reinforcement learning algorithms and learn the prior.  This has enabled acquiring new skills from just a single demonstration or just a few trials. While designed for imitation and RL, our work is more generally applicable and also advanced the state of the art in standard few-shot classification benchmarks such as omniglot and mini-imagenet.

  • 11:30 - 12:00 PM, Stefano Ermon, Generative Adversarial Imitation Learning
    • Abstract: Consider learning a policy from example expert behavior, without interaction with the expert or access to a reward or cost signal. One approach is to recover the expert’s cost function with inverse reinforcement learning, then  compute an optimal policy for that cost function. This approach is indirect and can be slow. In this talk, I will discuss a new generative modeling framework for directly extracting a policy from data, drawing an analogy between imitation learning and generative adversarial networks. I will derive a model-free imitation learning algorithm that obtains significant performance gains over existing methods in imitating complex behaviors in large, high-dimensional environments. Our approach can also be used to infer the latent structure of human demonstrations in an unsupervised way. As an example, I will show a driving application where a model learned from demonstrations is able to both produce different driving styles and accurately anticipate human actions using raw visual inputs.
  • 02:00 - 02:30 PM, Michael Laskey, Pixels-to-Policies from Fallible Human Demonstrations
    • Abstract: Motivated by recent advances in Deep Learning for robot control, this talk considers learning algorithms in terms of how they acquire demonstrations from fallible human supervisors. Behavior Cloning, or Off-Policy, is a standard supervised learning algorithm, where a human supervisor demonstrates the task by teleoperating the robot to provide trajectories consisting of state-control pairs. A known problem with this approach is the compounding of errors, which occurs because the robot visits different states than the supervisor. On-Policy sampling is an increasingly popular alternative used in algorithms such as DAgger, where a human supervisor observes the robot execute a learned policy and provides corrective control labels for each state visited. On-Policy sampling can be challenging for human supervisors and prone to mislabeling, which we observed in a human study where a robot learns an image-to-control policy for part singulation. An alternative is the injection of artificial noise into the tele-operation system, which can simulate errors occurring during the collection of data with Off-Policy methods. I will present theoretical results that demonstrates noise injection provides robustness to the learned policy and then leverage our analysis to set noise levels. I will finally provide experimental results with human supervisor that demonstrates noise injection can increase robustness compared to traditional Behavior Cloning. 

  • 02:30 - 03:00 PM, Jon Scholz, Combining Rewards and Demonstrations for Real-World Control Problems
    • Abstract: This talk will survey ongoing work at DeepMind on leveraging human demonstrations for Reinforcement Learning on real-world control problems.  I'll start with an overview of several themes we have identified when bringing RL methods to bear on these tasks, including safety, sub-optimality of demonstrations, and the lack of available simulators. I'll also discuss two high-dimensional control problems that share these themes, one involving a robot arm interacting with a deformable object, and the other involving the cooling infrastructure for a Google-scale data center.
  • 04:00 - 04:30 PM, Joshua Tenenbaum, Reverse Engineering Human Intuitive Cognition
  • 04:30 - 05:00 PM, Sam Gershman, To learn from humans, learn like humans
    Abstract: Human behavior is a rich but highly ambiguous source of information about the mental representations underlying our flexibility and sample efficiency. To build autonomous agents that acquire human-like policies from demonstrations, we need to endow the agents with human-like mental representations so that their inferences about policies are both structured and constrained. This talk discusses our current understanding of human reinforcement learning and action selection, with a focus on how these ideas can be used to build better inverse reinforcement learning systems.
Submission Details:
Please submit a PDF abstract (along with supplementary material, if needed), maximum length two pages, in the RSS 2017 format (double-blind) via email to lfdhighdim.rss17@gmail.com by June 8, 2017 (Anywhere on Earth time). Submitted abstracts will be reviewed by the organizers. 
Accepted contributions will be notified by June 15, 2017. Each accepted contribution will be presented at the workshop as a 3 minute spotlight presentation as well as feature in an interactive poster session.

Accepted papers and eventual supplementary material will be made available on the workshop website (here). However, this does not constitute an archival publication and no formal workshop proceedings will be made available, meaning contributors are free to publish their work in archival journals or conference.

The following list contains some areas of interest, but work in other related areas is welcomed:
  • Learning from high-dimensional demonstrations
  • Deep inverse optimal control/inverse reinforcement learning
  • Predicting behavior from high-dimensional observations
  • Learning from multiple sensor modalities
  • High-dimensional knowledge transfer for sequential planning
  • Cognitive models for learning from demonstration and planning
  • One/few-shot imitation learning
  • Learning by observing external demonstrations
  • Application domains with high-dimensional observations (autonomous cars/agents, robotic manipulation, etc.)

Travel scholarship:

We are happy to announce that we can offer travel scholarships for student authors of accepted contributions that want to attend the workshop. The scholarship will be a maximum of 500$ per student and will be handed out to the top 3 student submissions. Please indicate in the email of your submission whether the first author is a student and you want to apply for the scholarship. We may increase the number/amount of money for travel scholarships pending availability of funds. Selected contributions will be notified upon acceptance.