Robust Reinforcement Learning for Continuous Control with Model Misspecification
Site overview
The videos in this site show the performance of the Entropy-regularized MPO (E-MPO) agent versus the Robust Entropy-regularized MPO (RE-MPO) agent for three tasks:
1. Cartpole Balance
2. Walker Walk
3. Cheetah
4. Shadowhand: Dexterous robotic hand
* It should be noted that similar performance can be seen for the non entropy regularized versions (i.e, MPO and R-MPO respectively) and therefore these videos were omitted.
** In addition the Soft-Robust versions had similar performance to the robust versions and therefore were omitted as well.
Entropy-regularized MPO (E-MPO)
Cartpole Balance Task Evaluation Environment Env2
The non-robust agent is unable to balance the pole.
Robust Entropy-Regularized MPO (RE-MPO)
Cartpole Balance Task Evaluation Environment Env2
The robust agent is able to successfully balance a pole length it has never seen before
Entropy-regularized MPO (E-MPO)
Walker Walk Task Evaluation Environment Env2
This agent is unstable and although it succeeds in achieving a gate movement, it very quickly falls to the ground.
Robust Entropy-Regularized MPO (RE-MPO)
Walker Walk Task Evaluation Environment Env2
Note that the agent learns to drag its leg due to the change in quadracep length. This prevents the agent from falling, is very stable and results in the improved performance compared to the non-robust agent.
Domain Randomization E-MPO
Here, the agent struggles to stand up and displays a similar behaviour, albeit slightly more robust, to that of E-MPO. However, it learns a significantly different policy to that of RE-MPO
Entropy-regularized MPO (E-MPO)
Cheetah Task Evaluation Environment Env2
The Cheetah learns an aggressive and unstable running policy which causes it to fall to the ground
Robust Entropy-Regularized MPO (RE-MPO)
Cheetah Task Evaluation Environment Env2
The Cheetah learns a running policy that prevents it from falling over
Entropy-Regularized MPO (E-MPO)
Shadowhand Orientation Task Evaluation Environment Env2
The Shadowhand attempts to orient the cube, but does not know how to manipulate a cube that is smaller than the one on which it was trained
Robust Entropy-regularized MPO (RE-MPO)
Shadowhand Orientation Task Evaluation Environment Env2
The Shadowhand manages to orient the cube into the correct position