Invited TALKS

UC San Diego, USA

Title: Signed Directional Distance Functions

Abstract: Signed distance functions (SDFs) have emerged as a fundamental representation in both sparse object-level and dense surface-level mapping. They enable accurate shape and surface reconstruction and support efficient distance queries for collision checking. Recently neural network models that map 3D coordinates to SDF values have shown impressive performance for object and surface reconstruction. SDF representations, however, suffer from one limitation -- synthesizing novel views from an SDF model is not easy. It requires tracing view rays to the SDF zero-level set which encodes the object surface. This talk will discuss a new signed directional distance function (SDDF) representation that retains the favorable properties of SDF but also allows efficient view synthesis. The latter has promising implications for autonomous navigation and active mapping.

Atanasov_ROPM_ICRA_May22 Nikolay Atanasov.pdf

Luca Carlone

Massachusetts Institute of Technology, USA

Title: Real-time Scene Understanding with 3D Scene Graphs: Latest Advances and Open Problems

Abstract: 3D scene understanding is a grand challenge for robotics and computer vision research. In particular, scene understanding is a prerequisite for safe and long-term autonomous robot operation, and for effective human-robot interaction. 3D scene graphs have recently emerged as a powerful high-level representation for scene understanding. A 3D scene graph describes the environment as a layered graph where nodes represent spatial concepts at multiple levels of abstraction and edges represent relations between concepts. While 3D scene graphs can serve as an advanced "mental model" for robots, how to build such a rich representation in real-time is still uncharted territory. This talk describes Hydra, the first perception system that builds a 3D scene graph from sensor data in real-time. Hydra includes real-time algorithms to incrementally construct the layers of a scene graph as the robot explores the environment. Moreover, it includes the first 3D scene graph optimization technique that converts the scene graph into a factor graph and simultaneously corrects all layers in response to loop closures. We show that 3D scene graphs create novel opportunities to enhance loop closure detection, demonstrate Hydra across multiple real and simulated datasets, and discuss open problems.

slides_ropm_Carlone.pdf

Daniel Cremer

Technical University of Munich, Germany

Title: Self-supervised Learning Approaches to Visual SLAM

Abstract: I will demonstrate how we can leverage the predictive power of self-supervised deep learning in order to significantly boost the performance of visual SLAM methods. The resulting methods allow us to track a single camera with a precision that is on par with state-of-the-art stereo-inertial odometry methods. Moreover, I will introduce MonoRec and TANDEM - deep networks that can generate faithful dense reconstruction of outdoor and indoor environments from a single moving camera.

DeepSLAM2022_Cremers Daniel Cremers.pdf

Kostas Daniilidis

University of Pennsylvania, USA

Title: Learning to Map

Abstract: TBD

Andrew Davison

Imperial College London, UK

Title: A Robot Web for Distributed Many-Device Localisation

Abstract: We show that a distributed network of robots or other devices which make measurements of each other can collaborate to globally localise via efficient ad-hoc peer to peer communication. Our Robot Web solution is based on Gaussian Belief Propagation on the fundamental non-linear factor graph describing the probabilistic structure of all of the observations robots make internally or of each other, and is flexible for any type of robot, motion or sensor. We define a simple and efficient communication protocol which can be implemented by the publishing and reading of web pages or other asynchronous communication technologies. We show in simulations with up to 1000 robots interacting in arbitrary patterns that our solution convergently achieves global accuracy as accurate as a centralised non-linear factor graph solver while operating with high distributed efficiency of computation and communication. Via the use of robust factors in GBP, our method is tolerant to a high percentage of faults in sensor measurements or dropped communication packets.

davison Andrew Davison.pdf

Frank Dellaert

Georgia Institute of Technology, USA

Title: State estimation, navigation, and mapping in legged robots

Abstract: Factor graphs have been used in mapping and navigation for a long time now. In our lab and other labs people have recently also started to use them for motion planning and modeling articulated systems. This opens the way to use these factor graph concepts in new state estimation paradigms for articulated robots and legged robots in particular.

Maani Ghaffari

University of Michigan, USA

Title: Towards discovering symmetries in robot perception and mapping

Abstract: In this talk, I will present a few recent topics on robot perception and mapping that all have one thing in common: exploiting symmetries for solving robotics problems. Inspired by existing mathematical tools for studying the symmetry structures of geometric spaces, I go from a purely geometric symmetry-preserving method for state estimation to an end-to-end learning method for discovering left-invariant vector fields that align point clouds! Finally, I close the talk by encouraging a combined approach for building computational models that exploit structures with the hope of solving old and new problems in robot perception.

MGhaffari_ICRA_WS_May202 Maani Ghaffari.pdf

Jonathan How

Massachusetts Institute of Technology, USA

Title: Robust mapping and localization

Abstract: This talk will cover two recent works on distributed simultaneous localization and mapping (SLAM) and landmark-based loop closure. In the first part of the talk, we present the first distributed optimization algorithm for collaborative geometric estimation, the backbone of modern collaborative SLAM and structure-from-motion (SfM) applications. Our method allows agents to cooperatively reconstruct a shared geometric model on a central server by fusing individual observations, but without the need to transmit sensitive information about the agents themselves (such as their locations). Furthermore, to alleviate the burden of communication during iterative optimization, we design a set of communication triggering conditions that enable agents to selectively upload local information that are useful to global optimization. Our approach thus achieves significant communication reduction with minimal impact on optimization performance. Numerical evaluations on bundle adjustment problems from collaborative SLAM and SfM datasets show that our method performs competitively against existing distributed techniques while achieving up to 78% total communication reduction.

In the second part of the talk, we present a global data association method for loop closure in lidar scans using 3D line and plane objects simultaneously and in a unified manner. Using pole and plane objects in lidar SLAM can increase accuracy and decrease map storage requirements compared to commonly-used point cloud maps. However, place recognition and geometric verification using these landmarks is challenging due to the requirement for global matching without an initial guess. Existing works typically only leverage either pole or plane landmarks, limiting application to a restricted set of environments. The main novelty of our work is in the representation of line and plane objects extracted from lidar scans on the manifold of affine subspaces, known as the affine Grassmannian. Line and plane correspondences are matched using our graph-based data association framework and subsequently registered in the least-squares sense. Compared to pole-only approaches and plane-only approaches, our 3D affine Grassmannian method yields a 71% and 325% increase respectively to loop closure recall at 100% precision on the KITTI dataset and can provide frame alignment with less than 10 cm and 1 deg of error.

JonTalk2022_ICRA.pdf

Guoquan (Paul) Huang

University of Delaware, USA

Title: Visual-Inertial Estimation and Perception

Abstract: I will discuss the observability-based methodology for consistent state estimation for visual-inertial navigation system (VINS), and highlight some of our recent results, including OpenVINS, inertial preintegration for graph-based VINS, robocentric visual-inertial odometry, Schmidt-EKF for visual-inertial SLAM with deep loop closures, visual-inertial moving object tracking and many others.

20220523-viep-for-icraws-HUANG Paul Huang.pdf

Jongwoo Lim

Hanyang University, South Korea

Title: Real-world Application of OmniSLAM

Abstract: We have presented our work on Omnidirectional Localization and Dense Mapping for Wide-baseline Multicamera Systems (OmniSLAM) in ICRA 2020, and the paper was selected as one of the finalists of the Best Robot Vision paper award. Since then we have updated all the parts of the SLAM and depth estimation modules for robust and accurate real-world applications. In this talk, we will present the recent results of OmniMVS (omnidirectional depth estimation) and OmniSLAM from our sensor systems mounted on land robots, drones, and humans indoors and outdoors.

Sebastian Scherer

Carnegie Mellon University, USA

Title: Resilient Perception and Mapping in the Hardest (Subterranean) Situations

Abstract: SLAM works well in nominal situations, however, an often neglected factor is how the system degradges, and is especially important if the output is used for control. We will present our approach to achieve robust SLAM in adversarial settings, in dark dusty environments with varying scale and features using multi-sensor fusion, place recognition, as well as some outlook on future directions.

icra-workshop-basti Sebastian Scherer.pdf

Camillo J. Taylor

University of Pennsylvania, USA

Title: Representing the World with Panoramic Maps

Abstract: This talk will describe some of our experiments building and using representations of the world consisting of collections of 2D panoramic maps which can encode, depth, intensity and semantic information. We will argue that these representations, loosely inspired by the retinotopic representations found in animals, offer a number of advantages. The simplicity of the representation allows us to develop efficient algorithms for constructing and updating these maps in real time. The fact that the resulting representations are compact allows us to scale our representation to cover very large areas with a small memory footprint. The resulting representation also allows us to reason efficiently about the volumetric structure of the scene, a capability which can be used for a variety of purposes such as detecting transient objects or accelerating motion planning algorithms.