Kris M. Kitani
Carnegie Mellon University
Carnegie Mellon University
New Perspective on Perception and Prediction Pipeline for Autonomous Driving
Perception and prediction pipeline (3D object detection and multi-object tracking, trajectory forecasting) is a key component of an autonomous system. Although significant advancements have been achieved in individual modules of this pipeline, limited attentions are received to improve the pipeline itself. In this talk, we will introduce two alternatives to the standard perception and prediction pipeline: (1) As standard pipeline performs tracking and prediction in a sequential order, errors in the tracking module can degrade the performance of the prediction module. To resolve the issue, we propose to parallelize the tracking and forecasting modules so that the forecasting module does not explicitly depend on the tracking results with inevitable errors; (2) We also describe a new pipeline that inverts the order of forecasting. In contrast to the standard pipeline, this new pipeline first forecasts LiDAR point clouds. Then, detection and tracking are performed on the predicted future point clouds to obtain predicted future trajectories. As forecasting the sensor data does not require object labels for training, we can reduce the labeling requirement of the pipeline.
Kris M. Kitani is an associate research professor and director of the MS in Computer Vision program of the Robotics Institute at Carnegie Mellon University. He received his BS at the University of Southern California and his MS and PhD at the University of Tokyo. His research projects span the areas of computer vision, machine learning and human computer interaction. In particular, his research interests lie at the intersection of first-person vision, human activity modeling and inverse reinforcement learning. His work has been awarded the Marr Prize honorable mention at ICCV 2017, best paper honorable mention at CHI 2017, best technical paper at W4A 2017, best application paper ACCV 2014 and best paper honorable mention ECCV 2012.
Xinshuo Weng is a Ph.D. student (2018-) at the Robotics Institute of Carnegie Mellon University (CMU) supervised by Kris Kitani. She received her Masters (2016-17) at the CMU Robotics Institute, where she was working with Yaser Sheikh and Kris Kitani. Before entering her Ph.D. program at CMU, she worked at Oculus Research Pittsburgh (now Facebook Reality Lab) as a research engineer. Her Bachelor's degree was received from the School of Electronic Information at Wuhan University in China.
Raster-based Motion Prediction for Safe Self-Driving
Motion prediction is a critical component of self-driving technology, tasked with inferring future behavior of traffic actors as well as modeling behavior uncertainty. In the talk we focus on this important problem, and discuss raster-based methods that have shown state-of-the-art performance. These approaches take top-down images of a surrounding area as their input, providing near-complete contextual information necessary to accurately predict traffic motion. We present a number of recently proposed models, and show how to develop methods that obey map and other physical constraints of the environment.
Nemanja Djuric is a Staff Engineer and Tech Lead Manager at Uber ATG, for the past 5 years working on motion prediction, object detection, and other technologies supporting self-driving vehicles. Prior to ATG he worked as a research scientist at Yahoo Labs, which he joined after obtaining his PhD at Temple University.
(Single) Trajectory Prediction: Models, Dataset Complexity, and Benchmarking
The ability to reactively capture the change in dynamics of tracked objects builts the basis for a reliable trajectory prediction. Besides other factors, the number, variation, and characteristic of distinguishable motion types included in the trajectory data determine the level of complexity of the task. This talk will present our recent efforts to contribute to an improved trajectory prediction benchmarking with a focus on single trajectory models for varying object dynamics. Defining trajectory prediction benchmarks still faces the problem of quantification of sequence complexity. Towards this end, an approach for determining a dataset representation in terms of a small set of distinguishable prototypical sub-sequences is described. The prototype-based dataset representation is obtained by first employing a non-trivial spatial sequence alignment, which enables a following learning vector quantization (LVQ) stage.
Since current benchmarks still lack the possibility to gain insights into a model’s behavior under specific conditions, a new benchmark is introduced. The benchmark aims to take on a complementary role compared to existing benchmarks by providing a hierarchy of inference tasks (representation learning, de-noising, and prediction) compromised of several test cases targeting specific aspects of a given machine learning model. This offers a more differentiated evaluation of the model's behavior and generalization capabilities preceding to tasks focusing on jointly inferring interaction behavior or incorporating further context cues. As a result, a sanity check for single trajectory models is provided aiming to prevent failure cases and highlighting requirements for improving modeling capabilities.
Stefan Becker received his Ph.D. (Dr.-Ing) in computer science and his diploma in electric engineering from the Karlsruhe Institute of Technology (KIT). In 2011 he joined the Fraunhofer Institute for Optronics, System Technologies, and Image Exploitation (IOSB) where he is currently working as postdoctoral researcher in the “Video Content Analysis” group. He participated and contributed to several projects in industry, government, and EU.