Real-World Challenges and New Benchmarks for Deep Learning in Robotic Vision

A CVPR Workshop – Salt Lake City, Friday 22 June 2018, Room 150 - G

Ask your questions for the panel discussion here

Abstract

This workshop will bring together renowned experts from both the computer vision and robotics communities to discuss crucial challenges arising when deploying deep learning methods in real-world robotic applications, and identify the necessary research directions to meet these challenges.

As a major concrete outcome and activity, the workshop will discuss a set of future large scale robotic vision benchmarks to address the critical challenges for robotic perception that are not yet covered by existing computer vision and robotics benchmarks, such as performance in open-set conditions, incremental learning with low-shot techniques, Bayesian optimisation, active learning, and active vision.

These new benchmarks will complement existing benchmark competitions and will be run as an annual challenge at CVPR and ICRA. They will help to close the gap between computer vision and robotics, and will foster crucial advancements in machine learning for robotic vision.

Schedule

Our workshop will be on Friday 22 June 2018, in Room 150 - G.

Participate in the panel discussion and ask your questions for the panel discussion here.

9:00 - 9:20 Welcome and Introduction
9:20 - 9:40 Andreas Geiger (University of Tübingen and MPI for Intelligent Systems)
9:40 - 10:00 Poster Spotlights
10:00 - 10:30 Coffee Break
10:30 - 10:50 Dieter Fox (University of Washington and NVIDIA)
10:50 - 11:10 Walter Scheirer (University of Notre Dame)
11:10 - 11:30 Oliver Brock (TU Berlin)
11:30 - 12:00 Discussion - Research Challenges Robotic Vision
12:00 - 14:00 Lunch Break
14:00 - 14:20 Niko Sünderhauf (Australian Centre for Robotic Vision) - "New Robotic Vision Challenges"
14:20 - 14:40 Larry Zitnick (Facebook AI Research)
14:40 - 15:00 Kristen Grauman (University of Texas Austin)
15:00 - 15:30 Poster Session
15:30 - 16:00 Coffee Break
16:00 - 16:20 Vladlen Koltun (Intel Intelligent Systems Lab)
16:20 - 16:40 Dhruv Batra (Georgia Tech)
16:40 - 17:30 Panel Discussion and Closing Remarks
Ask your questions for the panel discussion here

Motivation

A robot or autonomous system often operates in uncontrolled and detrimental conditions that pose severe challenges to its perception system. Robots are inherently active agents that act in, and interact with the physical real world. They have to make decisions based on incomplete and uncertain knowledge, with potentially catastrophic results.

Computer vision challenges and competitions like ILSVRC or COCO had a significant influence on the advancements in object recognition, object detection, semantic segmentation, image captioning, and visual question answering in recent years. These challenges posed motivating problems to the computer vision and machine learning research communities and proposed datasets and evaluation metrics that allowed to compare different approaches in a standardized way.

However, visual perception for robotics faces challenges that are not well covered or evaluated by the existing benchmarks. Some of those specific challenges for robotic vision and topics of interest for the workshop are:

Deployment in open-set conditions requires reliable uncertainty estimation to identify unknown objects. (A robot will inevitably encounter objects of unknown classes that were not seen during training. It should not assign high-confidence labels to these unknown objects, which typically happens with state-of-the-art object detection systems.)
Incremental learning to address domain adaptation, cope with label shift, while avoiding catastrophic forgetting. (Since the characteristics and appearance of objects can be quite different in the deployment scenario and the training data, a robot has to incorporate new training samples of known classes during deployment and remember the visual objects it originally learned.)
Class-incremental learning, preferably using few-shot methods, to incorporate new classes not encountered during training. (As the deployment scenario might contain new classes of interest that were not available during training, a robot needs the capability to extend its knowledge and efficiently learn new classes without forgetting the classes it previously learned.)
Active learning to facilitate data efficient incremental learning. (A robot should be able to select the most informative samples for incremental learning techniques on its own in a data-efficient way, ask the user to provide ground truth labels, or search for annotation from other relevant sources, such as the web.)
Active vision exploits the embodiment of the perception system and controls the camera pose in the world to improve its perception. (A robot can move its camera to a different viewpoint to gain more information about an object of interest, e.g. by avoiding occlusions or reflections, or to disambiguate confusing objects.)
Transfer learning from simulation to reality (Simulation environments can provide a large amount of training data, especially for deep reinforcement learning, but face the challenge of overfitting to the often visually and conceptually simple simulation environment. Transferring the learnt system from simulation to reality is crucial.)

Contributed Papers

VisDA: A Synthetic-to-Real Benchmark for Visual Domain Adaptation. Judy Hoffman, Kate Saenko, Peng Xingchao, Ben Usman, Neela Kaushik, Dequan Wang
Paris-Lille-3D: A Point Cloud Dataset for Urban Scene Segmentation and Classification. François Goulette, Xavier Roynard, Jean-Emmanuel Deschaud
New Metrics and Experimental Paradigms for Continual Learning. Tyler Hayes, Nathan Cahill, Ronald Kemker, Christopher Kanan
Action-Conditioned Convolutional Future Regression Models for Robot Imitation Learning. Alan Wu, AJ Piergiovanni, Michael S Ryoo
Falling Things: A Synthetic Dataset for 3D Object Detection and Pose Estimation. Jonathan Tremblay, Stan Birchfield, Thang To
Learning Instance Segmentation by Interaction. Fred Shentu, Dian Chen, Deepak Pathak, Jitendra Malik, Pulkit Agrawal, Sergey Levine, Trevor Darrell
Active Vision Dataset Benchmark. Jana Kosecka, Phil Ammirato, Alex Berg
Zero-Shot Visual Imitation. Deepak Pathak, Pulkit Agrawal, Evan Shelhamer, Trevor Darrell, Alexei A Efros, Jitendra Malik, Michael Luo, Parsa Mahmoudieh, Dian Chen, Fred Shentu
Embodied Question Answering. Samyak Datta, Devi Parikh, Dhruv Batra, Abhishek Das, Georgia Gkioxari, Stefan Lee

Organisers

Niko Sünderhauf (Chief Investigator, Australian Centre for Robotic Vision)
Anelia Angelova (Research Scientist, Google Brain)
Feras Dayoub (Postdoctoral Fellow, QUT)

With support by

Gustavo Carneiro (Associate Professor, University of Adelaide)
Kevin Murphy (Research Scientist, Google Research)
Anton van den Hengel (Professor, University of Adelaide)
Vijay Kumar (Postdoctoral Fellow, University of Adelaide)
Jürgen Leitner (Postdoctoral Fellow, QUT)
Trung T. Pham (Postdoctoral Fellow, University of Adelaide)
Ingmar Posner (Associate Professor, University of Oxford)
Michael Milford (Professor, QUT)
Ian Reid (Professor, University of Adelaide)
Peter Corke (Professor, QUT, and Director, Australian Centre for Robotic Vision)

Program Committee

Raoul de Charette, INRIA
Chelsea Finn, UC Berkeley
David Held, CMU
Edward Johns, Imperial College
Stefan Leutenegger, Imperial College
Tsung-Yi Lin, Google Brain
Franziska Meier, Max Planck Institute Tuebingen
Michael Ryoo, Indiana University
Pierre Sermanet, Google Brain
Alex Toshev, Google Brain
Paul Wohlhart, X
Chenxia Wu, Cornell
Yezhou Yang, Arizona State University