Accelerated Policy Learning with Parallel Differentiable Simulation


Jie Xu Viktor Makoviychuk Yashraj Narang Fabio Ramos Wojciech Matusik Animesh Garg Miles Macklin


The Tenth International Conference on Learning Representations (ICLR 2022)

We evaluate our algorithm on six representative robotic control tasks:

CartPole Swing Up

Hopper

Ant

Humanoid

HalfCheetah

Muscle-Actuated Humanoid

The optimal policy learned for the high-dimensional muscle-actuated humanoid problem

(152 dimensional action space):

Copy of snu_track_dark.mp4

Visualization of policies from different training episodes:

Copy of ant_all.mp4
Copy of humanoid_spread_far.mp4
cheetah_short.mp4
Hopper_short.mp4
Copy of snu_all.mp4

Breakdown of Policy Training:

CartPole Swing Up + Balance

Copy of cartpole_0.mp4

Episode 0

Copy of cartpole_1.mp4

Episode 50

(36 seconds of training)

Copy of cartpole_2.mp4

Episode 100

(72 seconds of training)

Copy of cartpole_3.mp4

Episode 200

(2.5 minutes of training)

Copy of cartpole_4.mp4

Episode 500

(6 minutes of training)

Ant

Copy of ant_0.mp4

Episode 0

Copy of ant_1.mp4

Episode 200

(4 minutes of training)

Copy of ant_2.mp4

Episode 400

(8 minutes of training)

Copy of ant_3.mp4

Episode 800

(16 minutes of training)

Copy of ant_4.mp4

Episode 2000

(40 minutes of training)

Humanoid

Copy of humanoid_0.mp4

Episode 0

Copy of humanoid_1.mp4

Episode 200

(10.5 minutes of training)

Copy of humanoid_2.mp4

Episode 400

(21 minutes of training)

Copy of humanoid_3.mp4

Episode 800

(42 minutes of training)

Copy of humanoid_4.mp4

Episode 2000

(105 minutes of training)

Humanoid MTU

Copy of snu_0.mp4

Episode 0

Copy of snu_1.mp4

Episode 200

(8.5 minutes of training)

Copy of snu_2.mp4

Episode 400

(17 minutes of training)

Copy of snu_3.mp4

Episode 800

(34 minutes of training)

Copy of snu_4.mp4

Episode 2000

(85 minutes of training)