Real-World Challenges and New Benchmarks for Deep Learning in Robotic Vision

A CVPR Workshop – Salt Lake City, Friday 22 June 2018, Room 150 - G

Ask your questions for the panel discussion here


This workshop will bring together renowned experts from both the computer vision and robotics communities to discuss crucial challenges arising when deploying deep learning methods in real-world robotic applications, and identify the necessary research directions to meet these challenges.

As a major concrete outcome and activity, the workshop will discuss a set of future large scale robotic vision benchmarks to address the critical challenges for robotic perception that are not yet covered by existing computer vision and robotics benchmarks, such as performance in open-set conditions, incremental learning with low-shot techniques, Bayesian optimisation, active learning, and active vision.

These new benchmarks will complement existing benchmark competitions and will be run as an annual challenge at CVPR and ICRA. They will help to close the gap between computer vision and robotics, and will foster crucial advancements in machine learning for robotic vision.


Our workshop will be on Friday 22 June 2018, in Room 150 - G.

Participate in the panel discussion and ask your questions for the panel discussion here.


A robot or autonomous system often operates in uncontrolled and detrimental conditions that pose severe challenges to its perception system. Robots are inherently active agents that act in, and interact with the physical real world. They have to make decisions based on incomplete and uncertain knowledge, with potentially catastrophic results.

Computer vision challenges and competitions like ILSVRC or COCO had a significant influence on the advancements in object recognition, object detection, semantic segmentation, image captioning, and visual question answering in recent years. These challenges posed motivating problems to the computer vision and machine learning research communities and proposed datasets and evaluation metrics that allowed to compare different approaches in a standardized way.

However, visual perception for robotics faces challenges that are not well covered or evaluated by the existing benchmarks. Some of those specific challenges for robotic vision and topics of interest for the workshop are:

  • Deployment in open-set conditions requires reliable uncertainty estimation to identify unknown objects. (A robot will inevitably encounter objects of unknown classes that were not seen during training. It should not assign high-confidence labels to these unknown objects, which typically happens with state-of-the-art object detection systems.)
  • Incremental learning to address domain adaptation, cope with label shift, while avoiding catastrophic forgetting. (Since the characteristics and appearance of objects can be quite different in the deployment scenario and the training data, a robot has to incorporate new training samples of known classes during deployment and remember the visual objects it originally learned.)
  • Class-incremental learning, preferably using few-shot methods, to incorporate new classes not encountered during training. (As the deployment scenario might contain new classes of interest that were not available during training, a robot needs the capability to extend its knowledge and efficiently learn new classes without forgetting the classes it previously learned.)
  • Active learning to facilitate data efficient incremental learning. (A robot should be able to select the most informative samples for incremental learning techniques on its own in a data-efficient way, ask the user to provide ground truth labels, or search for annotation from other relevant sources, such as the web.)
  • Active vision exploits the embodiment of the perception system and controls the camera pose in the world to improve its perception. (A robot can move its camera to a different viewpoint to gain more information about an object of interest, e.g. by avoiding occlusions or reflections, or to disambiguate confusing objects.)
  • Transfer learning from simulation to reality (Simulation environments can provide a large amount of training data, especially for deep reinforcement learning, but face the challenge of overfitting to the often visually and conceptually simple simulation environment. Transferring the learnt system from simulation to reality is crucial.)

Contributed Papers


  • Niko Sünderhauf (Chief Investigator, Australian Centre for Robotic Vision)
  • Anelia Angelova (Research Scientist, Google Brain)
  • Feras Dayoub (Postdoctoral Fellow, QUT)

With support by

  • Gustavo Carneiro (Associate Professor, University of Adelaide)
  • Kevin Murphy (Research Scientist, Google Research)
  • Anton van den Hengel (Professor, University of Adelaide)
  • Vijay Kumar (Postdoctoral Fellow, University of Adelaide)
  • Jürgen Leitner (Postdoctoral Fellow, QUT)
  • Trung T. Pham (Postdoctoral Fellow, University of Adelaide)
  • Ingmar Posner (Associate Professor, University of Oxford)
  • Michael Milford (Professor, QUT)
  • Ian Reid (Professor, University of Adelaide)
  • Peter Corke (Professor, QUT, and Director, Australian Centre for Robotic Vision)

Program Committee

  • Raoul de Charette, INRIA
  • Chelsea Finn, UC Berkeley
  • David Held, CMU
  • Edward Johns, Imperial College
  • Stefan Leutenegger, Imperial College
  • Tsung-Yi Lin, Google Brain
  • Franziska Meier, Max Planck Institute Tuebingen
  • Michael Ryoo, Indiana University
  • Pierre Sermanet, Google Brain
  • Alex Toshev, Google Brain
  • Paul Wohlhart, X
  • Chenxia Wu, Cornell
  • Yezhou Yang, Arizona State University