Workshop on Learning from Instructional Videos

The Washington State Convention Center, Seattle, WA


Learning from instructions is a rapidly growing area in vision, learning, robotics and broader AI communities with fundamental technological and societal impact on a variety of problems, including designing intelligent interactive agents, building assistive technologies for patients/elderly/veterans/users and constructing massive knowledge bases of instructions. The goal of this workshop is to bring together experts in the area to discuss recent advances, existing challenges from algorithmic and application perspective, and opportunities and impact of solutions to these problems from both technological and societal perspective. In particular, in the workshop, we discuss challenges of learning instructions in the wild, which is learning from unconstrained noisy unaligned multimodal instructional data with large variations in appearance, background, actors and objects. We discuss recent advances in supervised, weakly supervised and unsupervised instruction learning and recent instructional video datasets. We highlight limitations of existing solutions, which are yet far from being ready to be deployed in real-world, identify unaddressed problems in the area, and discuss challenge problems and datasets needed.

Workshop Videos

Invited Talk 1: Dima Damen: Learning from Narrated Videos of Everyday Tasks [Video Link]

Invited Talk 2: Josef Sivic: Learning Visual Representations from Instructional Videos [Video Link]

Invited Talk 3: Cordelia Schmid: Action Recognition in Videos [Video Link]

Invited Talk 4: Animesh Garg: Structured Inductive Bias for Imitation from Videos [Video Link]

Spotlight 1: Weakly-Supervised Action Segmentation via Union of Subspaces, Zijia Lu, Ehsan Elhamifar [Video Link]

Spotlight 2: Discovering Actions by Joint Clustering Video and Narration Streams Across Tasks, Minttu Alakuijala, Julien Mairal, Jean Ponce, Cordelia Schmid [Video Link]

Invited Speakers

Cordelia Schmid

Research Director,


Josef Sivic

Senior Researcher,


Dima Damen

Associate Professor,

University of Bristol

Animesh Garg

Assistant Professor,

University of Toronto

Ashutosh Saxena

Co-Founder and CEO,

Program Committee

Hilde Kuehne


MIT–IBM Watson Lab

Ozan Sener

Postdoctoral Researcher,

Intel Labs

Ehsan Adeli

Research Fellow,

Stanford University

Luowei Zhou




Ehsan Elhamifar is currently an Assistant Professor in the Khoury College of Computer Sciences and is the director of the Mathematical, Computational and Applied Data Science (MCADS) Lab at Northeastern University. Dr. Elhamifar is a recipient of the DARPA Young Faculty Award and the NSF CISE Career Research Initiation Initiative Award. Previously, he was a postdoctoral scholar in the Electrical Engineering and Computer Science department at UC Berkeley. He obtained his PhD from the Electrical and Computer Engineering department at the Johns Hopkins University. Dr. Elhamifar's research areas are machine learning, computer vision and optimization. He develops scalable, robust and provable algorithms that address challenges of complex and massive high-dimensional data and works on applications of these tools to address Big visual data summarization, procedure learning from instructions, large-scale recognition with small labeled data and active learning for visual data.

Jason Corso is currently a Professor of Electrical Engineering and Computer Science at the University of Michigan. He received his Ph.D. in Computer Science at The Johns Hopkins University in 2005. He is a recipient of the NSF CAREER award (2009), ARO Young Investigator award (2010), Google Faculty Research Award (2015) and on the DARPA CSSG. He is also the Co-Founder and CEO of Voxel51, a computer vision tech startup that is building the state of the art platform for video and image based applications. His main research thrust is high-level computer vision and its relationship to human language, robotics and data science. He primarily focuses on problems in video understanding such as video segmentation, activity recognition, and video-to-text.

Juan Carlos Niebles received an Engineering degree in Electronics from Universidad del Norte (Colombia) in 2002, an M.Sc. degree in Electrical and Computer Engineering from University of Illinois at Urbana-Champaign in 2007, and a Ph.D. degree in Electrical Engineering from Princeton University in 2011. He is a Senior Research Scientist at the Stanford AI Lab and Associate Director of Research at the Stanford-Toyota Center for AI Research since 2015. He was also an Associate Professor of Electrical and Electronic Engineering in Universidad del Norte (Colombia) between 2011 and 2019. His research interests are in computer vision and machine learning, with a focus on visual recognition and understanding of human actions and activities, objects, scenes, and events. He has served as Area Chair for the top computer vision conferences CVPR and ICCV. He is also a member of the AI Index Steering Committee and is the Curriculum Director for Stanford-AI4ALL. He is a recipient of a Google Faculty Research award (2015), the Microsoft Research Faculty Fellowship (2012), a Google Research award (2011) and a Fulbright Fellowship (2005).


MIT-IBM Watson AI Lab

Contact Us

For questions or feedback, please contact Ehsan Elhamifar at