ManiPose: A Comprehensive Benchmark for Pose-aware Object Manipulation in Robotics

Abstract

Robotic manipulation in everyday scenarios, especially in unstructured environments, requires skills in pose-aware object manipulation (POM), which adapts robots' grasping and handling according to an object's 6D pose. Recognizing an object's position and orientation is crucial for effective manipulation. For example, if a mug is lying on its side, it's more effective to grasp it by the rim rather than the handle. Despite its importance, research in POM skills remains limited, because learning manipulation skills requires pose-varying simulation environments and datasets. This paper introduces ManiPose, a pioneering benchmark designed to advance the study of pose-varying manipulation tasks. ManiPose encompasses:  1) Simulation environments for POM feature tasks ranging from 6D pose-specific pick-and-place of single objects to cluttered scenes, further including interactions with articulated objects. 2) A comprehensive dataset featuring geometrically consistent and manipulation-oriented 6D pose labels for 2936 real-world scanned rigid objects and 100 articulated objects across 59 categories. 3) A baseline for POM, leveraging the inferencing abilities of LLM (e.g., ChatGPT) to analyze the relationship between 6D pose and task-specific requirements, offers enhanced pose-aware grasp prediction and motion planning capabilities. Our benchmark demonstrates notable advancements in pose estimation, pose-aware manipulation, and real-robot skill transfer, setting new standards for POM research.

Paper       Website    Code(coming soon) 

Video

Benchmark Environments

Pose-aware manipulation environments in ManiPose benchmark. (a) Single object with pose variation: Pick an object on the table and place it to a target pose. The initial and target poses (positions and orientations) are randomly selected. (b) Multi objects in cluttered scene: a pile of objects are place on the table. Pick one or more objects and place to target poses. (c) Articulated object interaction: Put in or take out objects to the cabinet drawer or door. The robot arm needs to open and close the articulated object by the handle and consider the relative poses of objects inside the cabinet.

Pose Estimation Dataset

Object pose type-level alignment based on objects' geometry and functions. X, Y, and Z axes are represented in red, green, and blue. In the drawer and microwave, the joint axis is represented in cyan.  Axial symmetric objects: X-axis is the axis of symmetry.  Mirror symmetric objects: X-Y plane is the symmetric plane, and X-Z plane could be the second symmetric plane along a longer length.  Functional objects: have functional and gripping areas. X-axis is the long axis direction; Y-axis is the grasp approaching direction.

Pose-aware Manipulation Baseline

Pose-aware object manipulation baseline. (a) Pose-invariant grasp pose prediction: generate grasp pose (GP) candidates by converting objects to pose-invariant base coordinate. (b) Action primitive planning: plan trajectory with action primitives: Move to, Grasp at, and Release. (c) Execute planned grasp poses and action primitives at each step.

Simulation Experiment 

 Store the red chip can in the drawer

Demonstration of pose-aware object manipulation baseline. The task is to store the red chip can on the table in the middle drawer. In the initial step 1, the ManiPose baseline method estimates object poses, predicts pose-invariant grasp poses, and plans action primitives. In the following steps 2 to 6,  the manipulation is sequentially executed. 

 In this task, the objective is to discern the spatial pose of the chip can within the drawer and its correlation with its current pose. Utilizing ChatGPT-4, the goal is to analyze the existing alignment of the chip can and to specify the desired pose to achieve the target arrangement. 

Real-world Experiment 

 Store the red chip can in the drawer

Transfer from ManiPose to real-robot experiments.   After successful simulation testing, we can use the same approach to facilitate task transfer to real-world scenarios by employing pose-aware object manipulation in real-robot experiments. 

Upright the mug lying on the shelf

In the process of up-righting a mug, it is necessary to consider the relationship between the grasping pose and the current pose of the object. Through ChatGPT-4, based on the object's explicit pose, suitable grasping poses can be filtered out to accomplish the task of up-righting a mug that is lying down. 

Manipulate objects via grasping