Multi-Stage Reinforcement Learning for Non-Prehensile Manipulation
School of Control Science and Engineering, Shandong University
Abstract
Manipulating objects without grasping them facilitates complex tasks, known as non-prehensile manipulation. Most previous methods are limited to learning a single skill to manipulate objects with primitive shapes, and are unserviceable for flexible object manipulation that requires a combination of multiple skills. We explore skill-unconstrained non-prehensile manipulation, and propose a Multi-stage Reinforcement Learning for Non-prehensile Manipulation (MRLNM), which calculates a set of intermediate states between the initial and goal states and divides the task into multiple stages for sequential learning. At each stage, the policy takes the desired 6-DOF object pose as the goal, and proposes a spatially-continuous action, allowing the robot to explore arbitrary skills to accomplish the task. To handle objects with different shapes, we propose a State-Goal Fusion Representation (SGF-Representation) to represent observations and goals as point clouds with motion, which improves the policy's perception of scene layout and task goal. To improve sample efficiency, we propose a Spatially-Reachable Distance Metric (SR-Distance) to measure the shortest distance between two points without intersecting the scene. We evaluate MRLNM on an occluded grasping task which aims to grasp the object in initially occluded configurations. MRLNM demonstrates strong generalization to unseen objects with shapes outside the training distribution and can be transferred to the real world with zero-shot transfer, achieving a 95% success rate.
1. Supplement
Due to the limited space of the paper, only the environmental parameters with far grasp are shown in Table 1. The complete environment parameters are as follows. We set different object size ranges for the two grasping poses, because a large width-to-height ratio would make objects difficult to flip. We set the largest width-to-height ratio to about 3:1. The maximum opening width of the gripper in the simulation is 6cm.
2. Real-world experiments
We fine-tuned MRLNM to facilitate sim-to-real transfer as follows:
(1) The element of adjusting the opening width of the gripper in the action is set to 0. Because the robotiq gripper we use will move the finger in the vertical direction when adjusting the grasping width, but the parallel gripper used in the simulation will not.
(2) The opening width of the gripper is set to a fixed value , i.e. 8 cm, because the size of the object used in the real world is large, and the maximum opening width used in the simulation is 6cm, which cannot be transferred to the real object.
(3) If the force of the robot end in a certain direction exceeds the threshold and the product of the action and the force is less than 0 (i.e., action will increase the force on the end.), the action of the corresponding dimension is set to 0. This measure prevents the mechanical arm from forcibly braking due to excessive torque.
(4) The range of environmental parameters used by ADR is shown in the figure below, which mainly increases the size of the object:
Supplements about the experiment:
The middle part of all objects is recessed because the real gripper protrudes a part on the side than the simulated gripper, which will cause the object to be blocked during the flipping process and cannot stand upright.
3. Video
4.1 Close grasp
4.1.1 Cases of successful manipulation
4.1.2 Cases of recovery from mistakes and human interference.
4.1.3 Cases of failed manipulation
4.2 Far grasp
4.2.1 Cases of successful manipulation
4.2.2 Cases of failed manipulation
Contact
DexinWang: dexinwang@mail.sdu.edu.cn