16-831 | Kartik Sharma, Kshitiz, Soumojit Bhattacharya | CMU Robotics Institute
Articulated object manipulation involves interacting with objects that have constrained joints such as drawers (prismatic) and doors (revolute), which is a challenging problem in robotic manipulation. Unlike the rigid body grasping these tasks require coordinated multi phase behaviors including navigation, reaching, grasping a handle and then sustained force application along a constrained joint axis.
For our project, we plan to use ManiSkill3 ( robotics simulation framework built on SAPIEN). We choose ManiSkill3 because it provides diverse articulated objects from the PartNet-Mobility dataset. We look into a particular task of OpenCabinetDrawer-v1, where a fetch mobile manipulator with 13-DoF must open a designated drawer on a cabinet. Each episode randomizes the cabinet geometry, handle position, and friction thereby requiring the policy to generalize across object geometries and physical parameters.
We benchmark three RL paradigms (PPO, SAC, Model-Based RL) and propose three modifications:Â
(1) Intrinsic Curiosity Module (ICM) with PPO for exploration,Â
(2) Demonstration-Augmented SAC (DA-SAC) combining imitation learning with reinforcement learning by leveraging trajectories generated from ManiSkill3’s motion-planning oracle, andÂ
(3) Behavior Cloning (BC) Warm-start with RL Fine-tuning.
Task: OpenCabinetDrawer-v1 (ManiSkill3)
Robot: Fetch 13-DoF
Episode length: 1000 steps at 20 Hz.
The random agent samples actions using env.action_space.sample() with no learning. Three trials were run with distinct seeds.
Results : Mean Return: 0.1287 +/- 0.1422 (undiscounted episodic return). Success Rate: 0%.
PPO (On-Policy)Â
SAC (Off-Policy)Â
Model-Based RLÂ
Modification 1: ICM + PPOÂ
Modification 2: DA-SAC (Demonstration-Augmented SAC)Â
Modification 3: Behavior Cloning (BC) Warm-start with RL Fine-tuningÂ
Final Mean Return: 51.39 +/- 19.38
Success Rate: 18.3%
(3 seeds, 2.5M steps)
Final Mean Return: 124.73 +/- 21.76
Success Rate: 58.0%
(3 seeds, 1M steps each)
Final Mean Return: 117.34 +/- 13.9
Success Rate: 70.1%
(3 seeds, 20M steps each)
Final Mean Return: 88.52 +/- 20.02
Success Rate: 63.3%
(3 seeds, 1M steps each)
Final Mean Return: 126.69 +/- 29.22
Success Rate: 69.2%
(3 seeds, 20M steps each)
Final Mean Return: 190.99 +/- 29.40
Success Rate: 48.6%
(3 seeds, 5.6-6M steps each)
Episode return: 0.2906
Actions: env.action_space.sample()
Success: No
Episode return: 0.0717
Actions: env.action_space.sample()
Success: No
Episode return: 0.0238
Actions: env.action_space.sample()
Success: No