Sumo agent trained in an ensemble of 3 policies
Right: Humanoid trained on walking
Left: Humanoid trained on Sumo
The length of the arrow is indicative of the applied force which varies from 400 to 800
Left: Kick and Defend agents trained without curriculum (no annealing of the dense exploration reward)
Right: Humanoid Sumo agent trained without curriculum (no annealing of the dense exploration reward)