Speakers and Talks

Peter Kontschieder

Title: Computer Vision with Less Supervision

Abstract: In this talk I am discussing two recent projects from Mapillary Research, both involving ways to automatically harvest and generate high-quality training data: The first project introduces the Mapillary Planet-Scale Depth Dataset (MPSD), which is a novel, large scale dataset to train models for single-image depth estimation. MPSD introduces a pipeline for generating depth training data for RGB images from heterogeneous sets of capture devices and explains how to train deep networks with data from such cameras and with different focal lengths. The second project discusses our solution to the task of multi-object tracking and segmentation (MOTS). We introduce a workflow for fully automated construction of MOTS training data, exploiting existing, state-of-the-art Panoptic Segmentation and Optical Flow estimation models. Another contribution of this work is a novel Mask-Pooling layer, integrated in a deep learning architecture coined MOTSNet. We can show that our model outperforms existing MOTS approaches without relying on human annotations, thanks to our improved MOTSNet architecture and the possibility of extracting high-fidelity training data at scale.

Bio: Peter Kontschieder received his MSc and PhD from Graz University of Technology in Austria in 2008 and 2013, respectively. From 2013-2016 he was a postdoctoral researcher in the Machine Intelligence and Perception group at Microsoft Research in Cambridge (UK). In 2016 he joined Mapillary and founded Mapillary Research - Mapillary's research lab focusing on basic research in Computer Vision and Machine Learning. With the acquisition of Mapillary by Facebook in 2020, Peter became a Research Scientist Manager. Peter received the Marr Prize in 2015 for his contribution of Deep Neural Decision Forests, joining deep learning with decision forests. He co-organized several workshops on object recognition, e.g. the Large-Scale Scene Understanding workshop at CVPR17, the Joint COCO and Mapillary workshops at ECCV18 and CVPR19, or the Robust Vision workshop at ECCV20, respectively. He regularly published his research in high-impact conferences like ICCV, CVPR, ECCV, NeurIPS, ICML and IJCAI.

Deva Ramanan

Title: Geometry, Motion, and the Unknown: Learning to Perceive for Navigation

Bio: Deva Ramanan is an associate professor at the Robotics Institute at Carnegie-Mellon University and the director of the CMU Argo AI Center for Autonomous Vehicle Research. His research interests span computer vision and machine learning, with a focus on visual recognition. He was awarded the David Marr Prize in 2009, the PASCAL VOC Lifetime Achievement Prize in 2010, an NSF Career Award in 2010, the UCI Chancellor's Award for Excellence in Undergraduate Research in 2011, the IEEE PAMI Young Researcher Award in 2012, was named one of Popular Science's Brilliant 10 researchers in 2012, was named a National Academy of Sciences Kavli Fellow in 2013, and won the Longuet-Higgins Prize in 2018 for fundamental contributions in computer vision. His work is supported by NSF, ONR, DARPA, as well as industrial collaborations with Intel, Google, and Microsoft. He served at the program chair of the IEEE Computer Vision and Pattern Recognition (CVPR) 2018. He is on the editorial board of the International Journal of Computer Vision (IJCV) and is an associate editor for the IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI). He regularly serves as a senior program committee member for CVPR, the International Conference on Computer Vision (ICCV), and the European Conference on Computer Vision (ECCV). He also regularly serves on NSF panels for computer vision and machine learning.

Antonio Lopez

Title: Minimizing Human Labeling Effort in On-board Vision-based Perception

Abstract: Collecting images and manually labeling them for training visual models has been a major bottleneck since computer vision and machine learning walk together. This has been more evident since computer vision falls on the shoulders of data-hungry deep learning techniques. Therefore, any approach aiming at reducing such a time-consuming and costly work is of high interest for addressing computer vision applications such as autonomous driving. In this talk we review our recent work in the line of minimizing human data labeling, including the use of end-to-end driving as representation learning strategy, co-training for object detection, and learning to classify unseen classes leveraging synthetic samples.

Bio: Antonio is the principal investigator of the Autonomous Driving group at the Computer Vision Center (CVC) at the Universitat Autonoma de Barcelona (UAB), where he has a tenure position in Computer Science. Antonio has been deeply involved in the creation of the SYNTHIA dataset and the CARLA simulator.

Raquel Urtasun

Title: Joint Perception and Motion Forecasting

Bio: Raquel Urtasun is the Chief Scientist for Uber ATG and the Head of Uber ATG Toronto. She is also a Professor at the University of Toronto, a Canada Research Chair in Machine Learning and Computer Vision and a co-founder of the Vector Institute for AI. She received her Ph.D. from the Ecole Polytechnique Federal de Lausanne (EPFL) in 2006 and did her postdoc at MIT and UC Berkeley. She is a recipient of an NSERC EWR Steacie Award, an NVIDIA Pioneers of AI Award, a Ministry of Education and Innovation Early Researcher Award, three Google Faculty Research Awards, an Amazon Faculty Research Award, a Connaught New Researcher Award, a Fallona Family Research Award and two Best Paper Runner up Prize awarded CVPR in 2013 and 2017. She was also named Chatelaine 2018 Woman of the year, and 2018 Toronto's top influencers by Adweek magazine.

Alex Kendall

Title: End-to-End Deep Learning for Autonomous Driving

Bio: Alex co-founded and is CEO at Wayve, a London-based start-up pioneering end-to-end deep learning algorithms for autonomous driving. Additionally, he holds a research fellowship at Trinity College at the University of Cambridge, where he completed his PhD. Alex's research has appeared at leading computer vision, robotics and machine learning conferences and is the recipient of UK and European awards for scientific impact. He is interested in building robots which can learn to do more intelligent things with less data.

Urs Muller

Title: Learning from Human Drivers

Abstract: In 2016 a small core team at the NVIDIA lab in Holmdel New Jersey demonstrated a learned driving application on local roads and on highways using a Deep Neural Net (DNN). That driving application is now known as PilotNet. In the following four years PilotNet, when combined with adaptive cruise control, has progressed from a demo to a research driving system that can stay in lane for hundreds of kilometers without human intervention, even in difficult conditions. This level of performance is achieved using a single forward-facing camera, without using HD maps or LIDAR. This talk will discuss how PilotNet has evolved, emphasizing the enabling infrastructure that has been created. Our team has grown, and we now have access to large volumes of driving data for training. To test new versions of PilotNet a simulator has been built that uses recorded data to ensure realistic sensor input but that can also simulate closed-loop to capture the cumulative effect of small errors of the driving system. The simulator enables repeatable systematic tests on large amount of recorded data. Early PilotNet output directly a steering wheel angle; today’s PilotNet produces the desired 3D vehicle trajectory. An external controller keeps the vehicle on this trajectory. A single neural network takes pixels as input and produces the trajectory as output with no intervening steps. This approach focuses on learning from observations, as opposed to decomposing the task into components and handcrafting of rules. The approach ensures that the neural network can learn to use all relevant information present in pixels to steer the car.

Bio: Urs Muller joined NVIDIA in 2015 to build and lead the autonomous driving team in Holmdel, New Jersey. The team focuses on new learning-based robust solutions for self-driving cars. Previously, Muller worked at Bell Labs and later founded Net-Scale Technologies, Inc., a prime contractor for several DARPA robotics and deep learning programs.

Alyssa Pierson

Title: Modeling Socially-Aware and Risk-Aware Autonomy

Abstract: Robots will transform our everyday lives, from home service and personal mobility, to large-scale warehouse management and agriculture monitoring, and autonomous driving. Across these applications, robots need to interact with humans and other robots in complex, dynamic environments. Understanding how robots interact allows us to design safer and more robust systems. This talk presents an overview on how we can integrate underlying cooperation and risk models into the design of the robot teams. Creating a team of capable, collaborative robots requires insight into several challenges in decision-making. The correct response of the robot is linked to its task, surroundings, and cooperation of other robots. My talk focuses on how we can (i) define metrics of risk for autonomous safety nets; (ii) use risk metrics in navigating occluded intersections; and (iii) create socially-compliant autonomous systems. We use tools from behavioral decision theory to design interaction models, combined with game theory and control theory to develop distributed control strategies with performance guarantees. This talk focuses on applications in autonomous driving, where better understanding of human intent and overall risk improves safety.

Bio: Alyssa Pierson is a research scientist in the Computer Science and Artificial Intelligence Laboratory (CSAIL) at Massachusetts Institute of Technology, and will be joining Boston University as an Assistant Professor in Mechanical Engineering in January 2021. She obtained her BS in engineering from Harvey Mudd College in 2010, and her MS and PhD in Mechanical Engineering from Boston University in 2016 and 2017, respectively. While at Boston University, she was awarded the Clare Boothe Luce Fellowship and was a Best Paper Finalist at the 2016 International Conference on Robotics and Automation. Her research interests focus on trust and cooperation in multi-agent systems, distributed control, and socially-compliant autonomous system design.

Andreas Geiger

Title: Learning Robust Driving Policies

Abstract: I will present two recent results on learning robust driving policies that lead to state-of-the-art performance in the CARLA simulator. To generalize across diverse conditions, humans leverage multiple types of situation-specific reasoning and learning strategies. Motivated by this observation, I will first present a new framework for learning a situational driving policy that effectively captures reasoning under varying types of scenarios and leads to 98% success rate on the CARLA driving benchmark as well as state-of-the-art performance on a newly introduced generalization benchmark. In the second part of my talk, I will discuss the problem of covariate shift in imitation learning. I will demonstrate that the majority of data aggregation techniques for addressing this problem have poor generalization performance, and present a novel approach with empirically better generalization performance. The key idea is to sample critical states from the collected on-policy data based on the utility they provide to the learned policy and to incorporate a replay buffer which progressively focuses on the high uncertainty regions of the policy's state distribution. The proposed approach is evaluated on the CARLA NoCrash benchmark, focusing on the most challenging driving scenarios with dense pedestrian and vehicle traffic, achieving 87% of the expert performance while also reducing the collision rate by an order of magnitude without the use of any additional modality, auxiliary tasks, architectural modifications or reward from the environment.

Bio: Andreas Geiger is professor at the University of Tübingen and group leader at the Max Planck Institute for Intelligent Systems. Prior to this, he was a visiting professor at ETH Zürich and a research scientist at MPI-IS. He studied at KIT, EPFL and MIT and received his PhD degree in 2013 from the KIT. His research interests are at the intersection of 3D reconstruction, motion estimation, scene understanding and sensory-motor control. He maintains the KITTI vision benchmark and is part of the NVIDIA NVAIL and the Intel NIS programs.

Anelia Angelova

Title: Learning from Self- and Weak Supervision

Abstract: Autonomous driving applications rely on learning from vast amounts of data. In this talk we will look into approaches that learn in unsupervised manner or with less labeling. We first describe how to learn scene depth in dynamic scenes from self-supervision, learning from the ego-motion video only. We will relax the constraints of known camera by learning the camera parameters from videos in the wild, and also by removing additional semantic labeling. We demonstrate results on challenging dynamic scenes such as Cityscapes and Waymo Open Dataset, as well as on videos in the wild. Code is open sourced for all approaches discussed. We will further present the ShapeMask instance segmentation approach, which generalizes very well to novel categories and is able to train with highly reduced segmentation mask labeling. Code, models and tutorials are available for public use on the Google Cloud Platform. Finally, we will introduce Taskology, which is able to learn across tasks, many datasets, with weak labels and unsupervised data. A distributed version of the algorithm is also demonstrated, which can obtain scalability across tasks and datasets.

Bio: Anelia Angelova is a research scientist in the area of computer vision. She leads the Robot Vision research team at Robotics at Google and Google Research. Her most recent research focuses on deep learning for robotics perception, including semantic and 3D scene understanding and real-time algorithms for pedestrian detection and robot grasp localization. She has integrated her work in production systems, including the first deep neural network models running onboard Google's self-driving car, now Waymo. Anelia received her MS and PhD degrees in Computer Science from California Institute of Technology.

Wolfram Burgard

Title: Self-Supervised Learning for Perception Tasks in Automated Driving

Abstract: At the Toyota Research Institute we are following the one-system-two-modes approach to building truly automated cars. More precisely, we simultaneously aim for the L4/L5 chauffeur application and the the guardian system, which can be considered as a highly advanced driver assistance system of the future that prevents the driver from making any mistakes. TRI aims to equip more and more consumer vehicles with guardian technology and in this way to turn the entire Toyota fleet into a giant data collection system. To leverage the resulting data advantage, TRI performs substantial research in machine learning and, in addition to supervised methods, particularly focuses on unsupervised and self-supervised approaches. In this presentation, I will present three recent results regarding self-supervised methods for perception problems in the context of automated driving. I will present novel approaches to inferring depth from monocular images and a new approach to panoptic segmentation.

Bio: Wolfram Burgard is VP for Automated Driving Technology at the Toyota Research Institute. He is on leave from his professorship at the University of Freiburg where he heads the research group for Autonomous Intelligent Systems. Wolfram Burgard is known for his contributions to mobile robot navigation, localization and SLAM (simultaneous localization and mapping). He has published more than 350 papers in the overlapping area of robotics and artificial intelligence.

Alexandre Alahi

Title: Is Perception the Main Challenge of Autonomous Driving?

Abstract: Is Perception the bottleneck of Autonomous Driving? I argue that AI must go beyond perception tasks and develop broader cognition: learn and obey unwritten common sense rules and comply with social conventions in order to gain human trust. I will present a new type of cognition I call socially-aware AI to address these challenges.