Invited Speakers

Invited Speakers (updated)

Josh Tenenbaum

Title: Reverse-engineering human common sense to engineer more human-like robots

Abstract: Recent successes in computer vision and other areas of artificial intelligence have been largely attributed to advances in pattern recognition — most prominently deep neural networks, as well as other machine learning techniques. But human intelligence is more than just pattern recognition. We can see these abilities at work in young children, in ways that even a six month old baby is more intelligent than any AI system yet built. The heart of human common sense depends on is our ability to model the world: to explain and understand what we see, to imagine things we could see but haven’t yet, to solve problems and plan actions to make these things real, and to build new models as we learn more about the world. I will talk about prospects for reverse-engineering these capacities at the core of human intelligence, and using what we learn to advance robotics. In particular, I will introduce basic concepts of probabilistic programs and program induction, which together with tools from deep learning and modern video game engines provide an approach to make robots smarter and more cooperative in more human-like ways.


Pieter Abbeel

Title: Reducing Data Needs for Real-World Reinforcement Learning

Abstract: Reinforcement learning and imitation learning have seen success in many domains, including autonomous helicopter flight, Atari, simulated locomotion, Go, robotic manipulation. However, sample complexity of these methods remains very high. In this talk I will present several ideas towards reducing sample complexity: (i) Hindsight Experience Replay, which infuses learning signal into (traditionally) zero-reward runs, and is compatible with existing off-policy algorithms; (ii) Some recent advances in Model-based Reinforcement Learning, which achieve 100x sample complexity gain over the more widely studied model-free methods; (iii) Meta-Reinforcement Learning, which can significantly reduce sample complexity by building off other skills acquired in the past; (iv) Domain Randomization, a simple idea that can often enable training fully in simulation, yet still recover policies that perform well in the real world.

Bio: Pieter Abbeel (Professor at UC Berkeley [2008- ], Co-Founder Gradescope [2014- ], Research Scientist at OpenAI [2016-2017]) works in machine learning and robotics, in particular his research focuses on making robots learn from people (apprenticeship learning), how to make robots learn through their own trial and error (reinforcement learning), and how to speed up skill acquisition through learning-to-learn. His robots have learned advanced helicopter aerobatics, knot-tying, basic assembly, and organizing laundry. His group has pioneered deep reinforcement learning for robotics, including learning visuomotor skills and simulated locomotion. He has won various awards, including best paper awards at ICML, NIPS and ICRA, the Sloan Fellowship, the Air Force Office of Scientific Research Young Investigator Program (AFOSR-YIP) award, the Office of Naval Research Young Investigator Program (ONR-YIP) award, the DARPA Young Faculty Award (DARPA-YFA), the National Science Foundation Faculty Early Career Development Program Award (NSF-CAREER), the Presidential Early Career Award for Scientists and Engineers (PECASE), the CRA-E Undergraduate Research Faculty Mentoring Award, the MIT TR35, the IEEE Robotics and Automation Society (RAS) Early Career Award, and the Dick Volz Best U.S. Ph.D. Thesis in Robotics and Automation Award.


Jitendra Malik

Title: Vision for Manipulation and Navigation

Abstract: I will describe recent results from my group on visually guided manipulation and navigation. We are guided considerably by insights from human development and cognition. In manipulation, our work is based on object-oriented task models acquired by experimentation. In navigation, we show the benefits of architectures based on cognitive maps and landmarks.

Bio: Jitendra Malik is Arthur J. Chick Professor of EECS at UC Berkeley, and has published widely in computer vision, computer graphics, robotics and machine learning. Several well-known concepts and algorithms arose in this research, such as anisotropic diffusion, normalized cuts, high dynamic range imaging, shape contexts and R-CNN. Jitendra received the Distinguished Researcher in Computer Vision Award from IEEE, the K.S. Fu Prize from IAPR, and the Allen Newell award from ACM and AAAI. He has been elected to the National Academy of Sciences, the National Academy of Engineering and the American Academy of Arts and Sciences.


Martial Hebert

Title: Reducing Supervision

Abstract: A key limitation, in particular for computer vision tasks, is their reliance on vast amounts of strongly supervised data. This limits scalability, prevents rapid acquisition of new concepts, and limits adaptability to new tasks or new conditions. To address this limitation, I will explore ideas in learning visual models from limited data. The basic insight behind all of these ideas is that it is possible to learn from a large corpus of vision tasks how to learn models for new tasks with limited data, by representing the way visual models vary across tasks, also called model dynamics. The talk will also show examples from common visual classification tasks.

Bio: Martial Hebert is the Director of the Robotics Institute at Carnegie Mellon University. His work is in the areas of computer vision and perception for autonomous systems. Hebert has contributed to early program for self-driving vehicles, in leading the development of perception capabilities for personal robots, and for a variety of autonomous systems. His research interests include computer vision, especially recognition in images and video data; model building and object recognition from 3-D data; and perception for mobile robots and intelligent vehicles. His group has developed approaches to object recognition and scene analysis in images, 3-D point clouds and video sequences.


Raquel Urtasun

Title: Deep Learning for Self-Driving Cars

Bio: Raquel Urtasun is the Head of Uber ATG Toronto. She is also an Associate Professor in the Department of Computer Science at the University of Toronto, a Raquel Urtasun is the Head of Uber ATG Toronto. She is also an Associate Professor in the Department of Computer Science at the University of Toronto, a Canada Research Chair in Machine Learning and Computer Vision and a co-founder of the Vector Institute for AI. Prior to this, she was an Assistant Professor at the Toyota Technological Institute at Chicago (TTIC), an academic computer science institute affiliated with the University of Chicago. She was also a visiting professor at ETH Zurich during the spring semester of 2010. She received her Bachelors degree from Universidad Publica de Navarra in 2000, her Ph.D. degree from the Computer Science department at Ecole Polytechnique Federal de Lausanne (EPFL) in 2006 and did her postdoc at MIT and UC Berkeley. She is a world leading expert in machine perception for self-driving cars. Her research interests include machine learning, computer vision, robotics and remote sensing. Her lab was selected as an NVIDIA NVAIL lab. She is a recipient of an NSERC EWR Steacie Award, an NVIDIA Pioneers of AI Award, a Ministry of Education and Innovation Early Researcher Award, three Google Faculty Research Awards, an Amazon Faculty Research Award, a Connaught New Researcher Award and two Best Paper Runner up Prize awarded at the Conference on Computer Vision and Pattern Recognition (CVPR) in 2013 and 2017 respectively. She is also an Editor of the International Journal in Computer Vision (IJCV) and has served as Area Chair of multiple machine learning and vision conferences (i.e., NIPS, UAI, ICML, ICLR, CVPR, ECCV).


Pierre Sermanet

Title: Self-Supervised Imitation

Abstract: We propose a self-supervised approach for learning representations and robotic behaviors entirely from unlabeled videos recorded from multiple viewpoints. We study how these representations can be used in two robotic imitation settings: imitating object interactions from videos of humans, and imitating human poses. Imitation of human behavior requires a viewpoint-invariant representation that captures the relationships between end-effectors (hands or robot grippers) and the environment, object attributes, and body pose. We train our representations using a triplet loss, where multiple simultaneous viewpoints of the same observation are attracted in the embedding space, while being repelled from temporal neighbors which are often visually similar but functionally different. This signal causes our model to discover attributes that do not change across viewpoint, but do change across time, while ignoring nuisance variables such as occlusions, motion blur, lighting and background. We demonstrate that this representation can be used by a robot to directly mimic human poses without an explicit correspondence, and that it can be used as a reward function within a reinforcement learning algorithm. While representations are learned from an unlabeled collection of task-related videos, robot behaviors such as pouring are learned by watching a single 3rd-person demonstration by a human. Reward functions obtained by following the human demonstrations under the learned representation enable efficient reinforcement learning that is practical for real-world robotic systems. Video results, open-source code and dataset are available at https://sermanet.github.io/imitate

Bio: Pierre Sermanet is a Research Scientist at Google Brain, conducting research in vision and robotics. He obtained his PhD from NYU under the supervision of Yann LeCun.