Emergent Complexity via Multi-agent Competition

Task 1: Run to Goal

ants-run-to-goal.mp4
humans-run-to-goal.mp4

Task 2: You Shall Not Pass

humans-you-shall-not-pass.mp4

Task 3: Sumo

Ant-Sumo.mp4
sumo-fights.mp4

Task 4: Kick and Defend

kick_defend_compilation_old.mp4
kick-and-defend-robust.mp4

Training against Ensemble of Policies

Sumo agent trained in an ensemble of 3 policies

sumo-ensemble.mp4

Robustness of learnt policy to wind-attack

Right: Humanoid trained on walking

Left: Humanoid trained on Sumo

The length of the arrow is indicative of the applied force which varies from 400 to 800

wind-attack-sumo.mp4
wind-attack-classic.mp4

Effect of Exploration Curriculum

Left: Kick and Defend agents trained without curriculum (no annealing of the dense exploration reward)

Right: Humanoid Sumo agent trained without curriculum (no annealing of the dense exploration reward)

nocurri-football.mp4
nocurri-sumo.mp4