ActPerMoMa: Active Perceptive Motion Generation for Mobile Manipulation

Snehal Jauhri*, Sophie Lueth*, and Georgia Chalvatzaki
PEARL Lab, TU Darmstadt, Germany
* authors contributed equally


International Conference on Robotics & Automation (ICRA), 2024

TL;DR: Visually informative motion generation for mobile manipulators in cluttered scenes, effectively balancing active perception, grasp detection, and executability


Abstract

Mobile Manipulation (MoMa) systems incorporate the benefits of mobility and dexterity, thanks to the enlarged space in which they can move and interact with their environment. MoMa robots can also continuously perceive their environment when equipped with onboard sensors, e.g., an embodied camera. However, extracting task-relevant visual information in unstructured and cluttered environments such as households remains challenging. In this work, we introduce an active perception pipeline for mobile manipulators to generate motions that are informative toward manipulation tasks such as grasping in initially unknown, cluttered scenes. Our proposed approach, ActPerMoMa, generates robot trajectories in a receding horizon fashion, sampling trajectories and computing path-wise utilities that trade-off reconstructing the unknown scene by maximizing the visual information gain and the task-oriented objective, e.g., grasp success by maximizing grasp reachability efficiently. We demonstrate the efficacy of our method in experiments with a dual-arm TIAGo++ MoMa robot performing mobile grasping in cluttered scenes.

Real robot demonstration

The robot explores the scene, detects the target object to be grasped and then runs our ActPerMoMa motion generator to gather more information and execute the most viable grasp. In the visualization on the top right, the red bounding box denotes the detected target object area and the blue points denote informative rear-side voxels considered for information gain. [Video sped up 6x]

ActPerMoMa

    

Using rough initial knowledge about the target area or target object position, we continuously plan and execute informative motions for the mobile grasping task. At every timestep, the RGBD information from the head-mounted embodied camera is integrated into a scene TSDF for both grasp detection and information gain computation. Using the currently known free space for movement of the robot base, we sample candidate robot paths, including both base and camera poses, towards the target. For each candidate path, we compute the information gained from camera views in the path, and the reachability of stable detected grasps from the final base poses in the path. We trade-off these objectives with a receding horizon cost and take a step of the optimal path for execution at every timestep.

Ours

Baseline (Next-Best View)