Human-Robot Interaction:

Visual Imitation Learning

Course Overview

CS 933 in Spring 2023 is focused on Visual Imitation Learning (visual-IL) or learning from visual demonstration. The central idea of visual-IL is making a robot learn a task only from video demonstrations. Here is an example of our own YuMi robot learning to make a cup of tea from video demonstrations by humans.

Learning a task policy, both low-level (such as reaching, pulling, pushing) and high level (such as furniture assembly, cooking, activities of daily living), from video demonstration is extremely challenging due to a multitude of reasons including visual correspondence problem, very high-dimensional state space, and uncertainty in vision data. Despite being the most convenient way to teach a robot, visual-IL was not studied much due to these challenges. However, the recent rise in deep neural networks and their ability to process high-dimensional video data have triggered an intense recent interest in visual-IL. We will study the state-of-the-art in visual-IL in this course.

Learning Expectations

This is a seminar style course. We will read and analyze research papers on visual-IL. Depending on students' interests we may identify a particular sub-problem in visual-IL and work together to develop a solution.

At the end of this course you will

  • have a solid understanding of imitation learning as an AI research problem

  • know the state-of-the-art in visual IL

  • learn to implement a visual-IL algorithm on a physical robot

Since this is a seminar-style course, a general objective is to enhance your abilities to analyze scientific research in a systematic manner and communicate scholarly thoughts/comments/ideas to a peer audience in a well-accepted manner.

Textbook

There is no assigned textbook. All necessary materials will be provided in the class.

Grading

This is a project based course and the final project carries the most significant role in grading.