Accelerated Policy Learning with Parallel Differentiable Simulation
Jie Xu Viktor Makoviychuk Yashraj Narang Fabio Ramos Wojciech Matusik Animesh Garg Miles Macklin
The Tenth International Conference on Learning Representations (ICLR 2022)
We evaluate our algorithm on six representative robotic control tasks:
CartPole Swing Up
Hopper
Ant
Humanoid
HalfCheetah
Muscle-Actuated Humanoid
The optimal policy learned for the high-dimensional muscle-actuated humanoid problem
(152 dimensional action space):
![](https://www.google.com/images/icons/product/drive-32.png)
Visualization of policies from different training episodes:
![](https://www.google.com/images/icons/product/drive-32.png)
![](https://www.google.com/images/icons/product/drive-32.png)
![](https://www.google.com/images/icons/product/drive-32.png)
![](https://www.google.com/images/icons/product/drive-32.png)
![](https://www.google.com/images/icons/product/drive-32.png)
Breakdown of Policy Training:
CartPole Swing Up + Balance
![](https://www.google.com/images/icons/product/drive-32.png)
Episode 0
![](https://www.google.com/images/icons/product/drive-32.png)
Episode 50
(36 seconds of training)
![](https://www.google.com/images/icons/product/drive-32.png)
Episode 100
(72 seconds of training)
![](https://www.google.com/images/icons/product/drive-32.png)
Episode 200
(2.5 minutes of training)
![](https://www.google.com/images/icons/product/drive-32.png)
Episode 500
(6 minutes of training)
Ant
![](https://www.google.com/images/icons/product/drive-32.png)
Episode 0
![](https://www.google.com/images/icons/product/drive-32.png)
Episode 200
(4 minutes of training)
![](https://www.google.com/images/icons/product/drive-32.png)
Episode 400
(8 minutes of training)
![](https://www.google.com/images/icons/product/drive-32.png)
Episode 800
(16 minutes of training)
![](https://www.google.com/images/icons/product/drive-32.png)
Episode 2000
(40 minutes of training)
Humanoid
![](https://www.google.com/images/icons/product/drive-32.png)
Episode 0
![](https://www.google.com/images/icons/product/drive-32.png)
Episode 200
(10.5 minutes of training)
![](https://www.google.com/images/icons/product/drive-32.png)
Episode 400
(21 minutes of training)
![](https://www.google.com/images/icons/product/drive-32.png)
Episode 800
(42 minutes of training)
![](https://www.google.com/images/icons/product/drive-32.png)
Episode 2000
(105 minutes of training)
Humanoid MTU
![](https://www.google.com/images/icons/product/drive-32.png)
Episode 0
![](https://www.google.com/images/icons/product/drive-32.png)
Episode 200
(8.5 minutes of training)
![](https://www.google.com/images/icons/product/drive-32.png)
Episode 400
(17 minutes of training)
![](https://www.google.com/images/icons/product/drive-32.png)
Episode 800
(34 minutes of training)
![](https://www.google.com/images/icons/product/drive-32.png)
Episode 2000
(85 minutes of training)