Variance Reduction for Reinforcement Learning in Input-Driven Environments

Walker2d with random wind

TRPO with meta baseline

TRPO with 10 value networks

TRPO with standard value network

HalfCheetah on floating tiles

TRPO with meta baseline

TRPO with 10 value networks

TRPO with standard value network

7-DoF arm tracking moving object

TRPO with meta baseline

TRPO with 10 value networks

TRPO with standard value network