RL for Articulated Object Manipulation in ManiSkill3

16-831 | Kartik Sharma, Kshitiz, Soumojit Bhattacharya | CMU Robotics Institute

Project Overview

Articulated object manipulation involves interacting with objects that have constrained joints such as drawers (prismatic) and doors (revolute), which is a challenging problem in robotic manipulation. Unlike the rigid body grasping these tasks require coordinated multi phase behaviors including navigation, reaching, grasping a handle and then sustained force application along a constrained joint axis.

For our project, we plan to use ManiSkill3 ( robotics simulation framework built on SAPIEN). We choose ManiSkill3 because it provides diverse articulated objects from the PartNet-Mobility dataset. We look into a particular task of OpenCabinetDrawer-v1, where a fetch mobile manipulator with 13-DoF must open a designated drawer on a cabinet. Each episode randomizes the cabinet geometry, handle position, and friction thereby requiring the policy to generalize across object geometries and physical parameters.

We benchmark three RL paradigms (PPO, SAC, Model-Based RL) and propose three modifications:

(1) Intrinsic Curiosity Module (ICM) with PPO for exploration,

(2) Demonstration-Augmented SAC (DA-SAC) combining imitation learning with reinforcement learning by leveraging trajectories generated from ManiSkill3’s motion-planning oracle, and

(3) Behavior Cloning (BC) Warm-start with RL Fine-tuning.

Random Agent Baseline

Task: OpenCabinetDrawer-v1 (ManiSkill3)
Robot: Fetch 13-DoF
Episode length: 1000 steps at 20 Hz.

The random agent samples actions using env.action_space.sample() with no learning. Three trials were run with distinct seeds.

Results : Mean Return: 0.1287 +/- 0.1422 (undiscounted episodic return). Success Rate: 0%.

Experiments

PPO (On-Policy)
SAC (Off-Policy)
Model-Based RL

Modification 1: ICM + PPO
Modification 2: DA-SAC (Demonstration-Augmented SAC)
Modification 3: Behavior Cloning (BC) Warm-start with RL Fine-tuning

Model-Based RL

seed_1.mp4

Success

seed_0.mp4

Success

seed_2.mp4

Success

Final Mean Return: 51.39 +/- 19.38

Success Rate: 18.3%

(3 seeds, 2.5M steps)

seed_0_failure_1.mp4

Failure

seed_2_failure_1.mp4

Failure

Behaviour Cloning (BC) Warm-start with RL Fine-tuning

seed_1.mp4

seed_2.mp4

seed_0.mp4

Final Mean Return: 124.73 +/- 21.76

Success Rate: 58.0%

(3 seeds, 1M steps each)

seed_0_failure_1.mp4

Failure

seed_0_failure_2.mp4

Failure

ICM + PPO Demonstrations

seed_2.mp4

Success

seed_1.mp4

Success

seed_0.mp4

Success

Final Mean Return: 117.34 +/- 13.9

Success Rate: 70.1%

(3 seeds, 20M steps each)

seed_0_failure_1.mp4

Failure

seed_0_failure_2.mp4

Failure

DA-SAC (Demonstration-Augmented SAC) Demonstrations

seed_0.mp4

Success

seed_1.mp4

Success

seed_2.mp4

Success

Final Mean Return: 88.52 +/- 20.02

Success Rate: 63.3%

(3 seeds, 1M steps each)

seed_0_failure_1.mp4

Failure

seed_0_failure_2.mp4

Failure

PPO (On-Policy) Demonstrations

0.mp4

Success

5.mp4

Success

2.mp4

Success

Final Mean Return: 126.69 +/- 29.22

Success Rate: 69.2%

(3 seeds, 20M steps each)

3.mp4

Failure

9.mp4

Failure

SAC (Off-Policy) Demonstrations

2.mp4

Success

4.mp4

Success

6.mp4

Success

Final Mean Return: 190.99 +/- 29.40

Success Rate: 48.6%

(3 seeds, 5.6-6M steps each)

3.mp4

Failure

1.mp4

Failure

Random Policy Demonstrations

random_agent_1.mp4

Random Agent - Trial 1

Episode return: 0.2906
Actions: env.action_space.sample()
Success: No

random_agent_2.mp4

Random Agent - Trial 2

Episode return: 0.0717
Actions: env.action_space.sample()
Success: No

random_agent3.mp4

Random Agent - Trial 3

Episode return: 0.0238
Actions: env.action_space.sample()
Success: No

Page updated

Report abuse