Emergent Complexity via Multi-agent Competition
Code for Environments and Trained Policies: https://github.com/openai/multiagent-competition
Code for Environments and Trained Policies: https://github.com/openai/multiagent-competition
Task 1: Run to Goal
Task 1: Run to Goal
![](https://www.google.com/images/icons/product/drive-32.png)
![](https://www.google.com/images/icons/product/drive-32.png)
Task 2: You Shall Not Pass
Task 2: You Shall Not Pass
![](https://www.google.com/images/icons/product/drive-32.png)
Task 3: Sumo
Task 3: Sumo
![](https://www.google.com/images/icons/product/drive-32.png)
![](https://www.google.com/images/icons/product/drive-32.png)
Task 4: Kick and Defend
Task 4: Kick and Defend
![](https://www.google.com/images/icons/product/drive-32.png)
![](https://www.google.com/images/icons/product/drive-32.png)
Training against Ensemble of Policies
Training against Ensemble of Policies
Sumo agent trained in an ensemble of 3 policies
![](https://www.google.com/images/icons/product/drive-32.png)
Robustness of learnt policy to wind-attack
Robustness of learnt policy to wind-attack
Right: Humanoid trained on walking
Left: Humanoid trained on Sumo
The length of the arrow is indicative of the applied force which varies from 400 to 800
![](https://www.google.com/images/icons/product/drive-32.png)
![](https://www.google.com/images/icons/product/drive-32.png)
Effect of Exploration Curriculum
Effect of Exploration Curriculum
Left: Kick and Defend agents trained without curriculum (no annealing of the dense exploration reward)
Right: Humanoid Sumo agent trained without curriculum (no annealing of the dense exploration reward)
![](https://www.google.com/images/icons/product/drive-32.png)
![](https://www.google.com/images/icons/product/drive-32.png)