LCS-RL

Enhancing Task Performance of Learned Simplified Models via Reinforcement Learning

Hien Bui¹ and Michael Posa¹

¹ The authors are with the GRASP Laboratory, University of Pennsylvania, Philadelphia, PA 19104, USA {xuanhien, posa}@seas.upenn.edu

Paper

Arxiv Preprint

Code

Github repository with code for LCS-RL

Available soon!

Abstract

In contact-rich tasks, the hybrid, multi-modal nature of contact dynamics poses great challenges in model representation, planning, and control. Recent efforts have attempted to address these challenges via data-driven methods, learning dynamical models in combination with model predictive control. Those methods, while effective, rely solely on minimizing forward prediction errors to hope for better task performance with MPC controllers. This weak correlation can result in data inefficiency as well as limitations to overall performance. In response, we propose a novel strategy: using a policy gradient algorithm to find a simplified dynamics model that explicitly maximizes task performance. Specifically, we parameterize the stochastic policy as the perturbed output of the MPC controller, thus, the learned model representation can directly associate with the policy or task performance. We apply the proposed method to contact-rich tasks where a three-fingered robotic hand manipulates previously unknown objects. Our method significantly enhances task success rate by up to 15% in manipulating diverse objects compared to the existing method while sustaining data efficiency. Our method can solve some tasks with success rates of 70% or higher using under 30 minutes of data.

Method

We present LCS-RL, a novel framework that leverages the combination of reinforcement learning (RL) and simple multi-contact models for solving contact-rich tasks.

Our framework applies a policy gradient algorithm, here we use Proximal Policy Optimization (PPO), to directly maximize the task performance of simplified models in combination with the MPC planner. In other words, we formulate a fully differentiable policy (truly an MPC problem) and perform backpropagation through the parameters of the dynamic models.

In this work, we represent the dynamic model by Linear Complementarity System (LCS), a piecewise-affine model. This type of simplified model is known to sufficiently capture the contact dynamics.

Trifinger Moving Cube Task

We verify our proposed framework on the three-fingered robotic hand manipulation task, called the TriFinger Moving Cube task.

In the first experiment, we show a comparison of the task performance of the dynamic models (with an MPC planner) trained by our method and prior methods.

LCS-RL (Ours)

Jin et al. 2022

Only PPO

PDDM

Early Training

Success rate: 2.5%

After 6 min of data

Success rate: 55%

Success rate: 2.5%

Success rate: 10%

After 30 min of data

Success rate: 66%

Success rate: 20%

Success rate: 10%

Success rate: 20%

Transfer Learning

To illustrate the transfer learning capabilities of our LCS-RL framework, we employ the LCS model initially trained on the TriFinger Moving Cube task as the starting point for training on other objects. The figure below shows that our LCS-RL framework is highly suitable for transfer learning. Particularly, we can observe that transfer learning significantly accelerates the training, yielding even higher final task success rates in all objects (except for the sugar box), compared to the training from scratch model.

Sugar Box

Mug

Clamp

Fish Can

Wrench

Banana

Sugar Box

Early Training

After 6 min of data

After 30 min of data

LCS-RL (Ours)

Success rate: 2.5%

Success rate: 85.0%

Success rate: 95.0%

Transfer Learning (Ours)

Success rate: 75.9%

Success rate: 78.7%

Success rate: 88.8%

Jin et al. 2022

Success rate: 2.5%

Success rate: 85.0%

Success rate: 55.0%

Fish Can

Early Training

After 6 min of data

After 30 min of data

LCS-RL (Ours)

Success rate: 2.5%

Success rate: 57.5%

Success rate: 62%

Transfer Learning (Ours)

Success rate: 71.2%

Success rate: 75.1%

Success rate: 80.7%

Jin et al. 2022

Success rate: 2.5%

Success rate: 57.5%

Success rate: 45%

Mug

Early Training

After 6 min of data

After 30 min of data

LCS-RL (Ours)

Success rate: 2.5%

Success rate: 30.0%

Success rate: 39.0%

Transfer Learning (Ours)

Success rate: 37.1%

Success rate: 37.2%

Success rate: 39.4%

Jin et al. 2022

Success rate: 2.5%

Success rate: 30.0%

Success rate: 12.5%

Wrench

Early Training

After 6 min of data

After 30 min of data

LCS-RL (Ours)

Success rate: 2.5%

Success rate: 50.0%

Success rate: 56.0%

Transfer Learning (Ours)

Success rate: 61.0%

Success rate: 62.4%

Success rate: 69.5%

Jin et al. 2022

Success rate: 2.5%

Success rate: 45.0%

Success rate: 12.5%

Clamp

Early Training

After 6 min of data

After 30 min of data

LCS-RL (Ours)

Success rate: 2.5%

Success rate: 25.0%

Success rate: 35.0%

Transfer Learning (Ours)

Success rate: 38.9%

Success rate: 43.5%

Success rate: 47.6%

Jin et al. 2022

Success rate: 2.5%

Success rate: 25.0%

Success rate: 5.0%

Banana

Early Training

After 6 min of data

After 30 min of data

LCS-RL (Ours)

Success rate: 2.5%

Success rate: 26.0%

Success rate: 31.0%

Transfer Learning (Ours)

Success rate: 34.2%

Success rate: 35.6%

Success rate: 38.5%

Jin et al. 2022

Success rate: 2.5%

Success rate: 26.0%

Success rate: 6.0%

Acknowledgment

Toyota Research Institute provided funds to support this work.

Citation

@article{Bui2024,

title = {Enhancing Task Performance of Learned Simplified Models via Reinforcement Learning},

author = {Bui, Hien and Posa, Michael},

year = {2023},

month = oct,

journal = {arXiv preprint arXiv:2310.09714},

arxiv = {2310.09714},

website = {https://sites.google.com/view/lcs-rl}

}