Workshop on Learning from Instructional Videos

The Washington State Convention Center, Seattle, WA


Learning from instructions is a rapidly growing area in vision, learning, robotics and broader AI communities with fundamental technological and societal impact on a variety of problems, including designing autonomous agents, building assistive technologies for patients/elderly/users and constructing massive knowledge bases. The goal of this workshop is to bring together experts in the area to discuss recent advances, existing challenges from algorithmic and application perspective, and opportunities and impact of solutions to these problems from both technological and societal perspective. In particular, in the workshop, we discuss challenges of procedure learning in the wild, which is learning from unconstrained noisy unaligned multimodal instructions with large variations in background, actors and objects. We discuss recent advances in supervised, weakly supervised and unsupervised procedure learning and recent instructional video datasets. We highlight limitations of existing solutions, which are yet far from being ready to be deployed in real-world, identify unaddressed problems in the area, and discuss challenge problems and datasets needed.

Paper Submission

We invite submissions of papers on different aspects of learning from instructions. Papers must be between two and four pages (including references) and follow the CVPR formatting guidelines using the provided author kit. Accepted papers will be presented as posters during the workshop. We accept submissions on work that is not published, currently under review, and already published. There will be no proceedings. Our goal is to get as many community members as possible into the discussion. Please send your extended abstracts to

There will a best paper award, which will receive registration and travel funding to attend the CVPR 2020. The best paper award is sponsored by MIT-IBM Watson AI Lab.

  • Submission Deadline: April 20, 2020 at 11:59pm Pacific Time
  • Decisions Announced: April 27, 2020, at 12:00pm Pacific Time
  • Camera Ready Deadline: May 8, 2020

Invited Speakers

Cordelia Schmid

Research Director,


Josef Sivic

Senior Researcher,


Abhinav Gupta

Associate Professor,

Carnegie Mellon University

Dima Damen

Associate Professor,

University of Bristol

Animesh Garg

Assistant Professor,

University of Toronto

Ashutosh Saxena

Co-Founder and CEO,

Program Committee

Hilde Kuehne


MIT–IBM Watson Lab

Ozan Sener

Postdoctoral Researcher,

Intel Labs

Ehsan Adeli

Research Fellow,

Stanford University

Luowei Zhou

PhD Candidate,

University of Michigan


Ehsan Elhamifar is currently an Assistant Professor in the Khoury College of Computer Sciences and is the director of the Mathematical, Computational and Applied Data Science (MCADS) Lab at Northeastern University. Dr. Elhamifar is a recipient of the DARPA Young Faculty Award and the NSF CISE Career Research Initiation Initiative Award. Previously, he was a postdoctoral scholar in the Electrical Engineering and Computer Science department at UC Berkeley. He obtained his PhD from the Electrical and Computer Engineering department at the Johns Hopkins University. Dr. Elhamifar's research areas are machine learning, computer vision and optimization. He develops scalable, robust and provable algorithms that address challenges of complex and massive high-dimensional data and works on applications of these tools to address Big visual data summarization, procedure learning from instructions, large-scale recognition with small labeled data and active learning for visual data.

Jason Corso is currently a Professor of Electrical Engineering and Computer Science at the University of Michigan. He received his Ph.D. in Computer Science at The Johns Hopkins University in 2005. He is a recipient of the NSF CAREER award (2009), ARO Young Investigator award (2010), Google Faculty Research Award (2015) and on the DARPA CSSG. He is also the Co-Founder and CEO of Voxel51, a computer vision tech startup that is building the state of the art platform for video and image based applications. His main research thrust is high-level computer vision and its relationship to human language, robotics and data science. He primarily focuses on problems in video understanding such as video segmentation, activity recognition, and video-to-text.

Juan Carlos Niebles received an Engineering degree in Electronics from Universidad del Norte (Colombia) in 2002, an M.Sc. degree in Electrical and Computer Engineering from University of Illinois at Urbana-Champaign in 2007, and a Ph.D. degree in Electrical Engineering from Princeton University in 2011. He is a Senior Research Scientist at the Stanford AI Lab and Associate Director of Research at the Stanford-Toyota Center for AI Research since 2015. He was also an Associate Professor of Electrical and Electronic Engineering in Universidad del Norte (Colombia) between 2011 and 2019. His research interests are in computer vision and machine learning, with a focus on visual recognition and understanding of human actions and activities, objects, scenes, and events. He has served as Area Chair for the top computer vision conferences CVPR and ICCV. He is also a member of the AI Index Steering Committee and is the Curriculum Director for Stanford-AI4ALL. He is a recipient of a Google Faculty Research award (2015), the Microsoft Research Faculty Fellowship (2012), a Google Research award (2011) and a Fulbright Fellowship (2005).


MIT-IBM Watson AI Lab

Contact Us

For questions or feedback, please contact Ehsan Elhamifar at