Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning


OP3 Soccer Team, DeepMind


Soccer players can tackle, get up, kick and chase a ball in one seamless motion. How could robots master these agile motor skills?

Movie 1: Project overview

We investigated the application of Deep Reinforcement Learning (Deep RL) for low-cost, miniature humanoid hardware in a dynamic environment, showing the method can synthesize sophisticated and safe movement skills making up complex behavioral strategies in a simplified one-versus-one (1v1) soccer game. 

Our agents, with 20 actuated joints, were trained in simulation using the MuJoCo physics engine, and transferred zero-shot to real robots. The agents use proprioception and game state features as observations. The trained soccer players exhibit robust and dynamic movement skills such as rapid fall recovery, walking, turning, kicking and more.  They transition between these emergent skills automatically in a smooth, stable, and efficient manner, going beyond what might intuitively be expected from the platform. The agents also developed a basic strategic understanding of the game, learning to anticipate ball movements and to block opponent shots.

Movie 2: Behavior and skill highlights

Recurring skills and strategies selected from typical one-versus-one play. The agent demonstrates agile skills including getting up and turning; reactive behavior including kicking a moving ball; object interaction including ball control; dynamic defensive blocking; strategical play including defensive positioning. The agent also quickly transitions between skills (turning, chasing, controlling, then kicking, for example), and combines them (frequently turning and kicking, for example).

Movie 3: Comparison to scripted baseline controllers

Certain key locomotion behaviors, including getting up, kicking, walking, and turning are available for the OP3 robot. This movie illustrates the baselines and a side-by-side comparison with the corresponding behaviors from the deep RL agent.

Movie 4: Turning and kicking behaviors in simulation and in the real environment

One of the agile behaviors we see during soccer play is the turning skill discovered by the agent, shown here in slow motion. It pivots on the corner of one foot and takes 2-3 steps to turn a 180 degrees. Although learned entirely in simulation, this behavior is successful on the OP3 after zero-shot transfer to the real robot, with perhaps surprisingly low sim-to-real gap given the highly optimized nature of the behavior. The agent's kicking behavior is also shown here in slow motion.

Movie S1: Training in simulation

We first trained individual skills in isolation, in simulation, and then composed those skills end-to-end in a self-play setting. We found that a combination of sufficiently high-frequency control and targeted dynamics randomization and perturbations during training in simulation enabled good-quality transfer to the robot.

Movie S2: 1v1 matches

5 one-versus-one matches. These matches are representative of the typical behavior and gameplay of the fully trained soccer agent.

Movie S3: Set pieces in simulation and in the real environment

We analysed the agent's performance in two set-pieces, to gauge the reliability of getting up and shooting behaviors and to measure the performance gap between the simulation and the real environment. We also compared behaviors with scripted baseline skills. In experiments they walked 156% faster, took 63% less time to get up, and kicked 24% faster than a scripted baseline.

Movie S4: Robustness and recovery from pushes

Although the robots are inherently fragile, minor hardware modifications together with basic regularization of the behavior during training lead to safe and effective movements while still being able to perform in a dynamic and agile way.

Preliminary Results:  Learning from vision

We conducted a preliminary investigation of whether deep RL agents can learn directly from raw egocentric vision. In this context the agent must learn to control its camera and integrate information over a window of egocentric viewpoints to predict various game aspects. Our initial analysis indicates that deep RL is a promising approach to this challenging problem. We conducted a simpler set-piece using fixed walker and ball positions and found our agent scored 10 goals in simulation and 6 goals on the real robot over 10 trials. 

We hope the challenge of integrating the get-up skill and learning vision-guided exploration and multi-agent strategies will be tackled by future work.

Movie S5: Preliminary vision based agents


More information

Please read the paper for more information about this research.