Achieving  Human  Level  Competitive  Robot  Table  Tennis

David B. D'Ambrosio¹*, Saminda Abeyruwan¹*, Laura Graesser¹*, Atil Iscen¹, Heni Ben Amor², Alex Bewley², Barney J. Reed²^, Krista Reymann², Leila Takayama²+, Yuval Tassa², Krzysztof Choromanski, Erwin Coumans, Deepali Jain, Navdeep Jaitly, Natasha Jaques, Satoshi Kataoka, Yuheng Kuang, Nevena Lazic, Reza Mahjourian, Sherry Moore, Kenneth Oslund, Anish Shankar, Vikas Sindhwani, Vincent Vanhoucke,
Grace Vesom, Peng Xu, and Pannag R. Sanketi¹

Google DeepMind
*: Corresponding authors (equal contribution, order randomized), ¹: Primary contributors, ²: Core contributors (alphabetized)
^: work done at Google DeepMind via Stickman Studios LLC, +: work done at Google DeepMind via Hoku Labs

PaperHighlights | Full Length Match Videos

Achieving human-level speed and performance on real world tasks is a north star for the robotics research community. This work takes a step towards that goal and presents the first learned robot agent that reaches amateur human-level performance in competitive table tennis. Table tennis is a physically demanding sport which requires human players to undergo years of training to achieve an advanced level of proficiency. In this paper, we contribute (1) a hierarchical and modular policy architecture consisting of (i) low level controllers with their detailed skill descriptors which model the agent's capabilities and help to bridge the sim-to-real gap and (ii) a high level controller that chooses the low level skills, (2) techniques for enabling zero-shot sim-to-real including an iterative approach to defining the task distribution that is grounded in the real-world and defines an automatic curriculum, and (3) real time adaptation to unseen opponents. Policy performance was assessed through 29 robot vs. human matches of which the robot won 45% (13/29). All humans were unseen players and their skill level varied from beginner to tournament level. Whilst the robot lost all matches vs. the most advanced players it won 100% matches vs. beginners and 55% matches vs. intermediate players, demonstrating solidly amateur human-level performance.


Coach Barney Demonstrates Capabilities

Match Highlights

Truly awesome to watch the robot play players of all levels and styles. Going in our aim was to have the robot be at an intermediate level. Amazingly it did just that, all the hard work paid off.

I feel the robot exceeded even my expectations. It was a true honor and pleasure to be a part of this research. I have learned so much and am very thankful for everyone I had the pleasure of working with on this.

- Barney J. Reed, Professional Table Tennis Coach

Motivation

Contributions

Our approach lead to competitive play at human level and a robot agent that humans actually enjoy playing with. To achieve this we make four technical contributions:

Method

The agent consists of a library of low-level skills and high-level controller that selects the most effective skill. Each low-level skill policy specializes in a specific aspect of table tennis, such as forehand topspin, backhand targeting, or forehand serve. In addition to training the policy itself, we collect and store information both offline and online about the strengths, weaknesses, and limitations of each low-level skill. The resulting skill descriptors provide the robot with important information regarding its abilities and shortcomings. In turn, a high-level controller, responsible for orchestrating the low-level skills, selects the optimal skill given the current game statistics, skill descriptors and the opponent's capabilities.

We collect a small amount of human-human play data to seed the initial task conditions. We then train an agent in simulation using RL and employ a number of techniques (known and novel) to deploy the policy zero-shot to real hardware. This agent plays with humans to generate more training task conditions and the training-deployment cycle is repeated. As the robot improves, the standard of play becomes progressively more complex whilst remaining grounded in real-world task conditions. This hybrid sim-real cycle creates an automatic task curriculum and enables the robot's skills to improve over time.

Hierarchical Control


Results

To evaluate the skill level of our agent, we ran competitive matches  against  29  table  tennis  players  of  varying  skill levels  –  beginner,  intermediate,  advanced,  and  advanced+ as determined by a professional table tennis coach.  The humans played 3 games against the robot following standard table tennis rules with some modifications because the robot is physically unable to serve the ball.  Against all opponents, the robot won 45% of matches and 46% of games.  Broken down by skill level, we see the robot won all matches against beginners, lost all matches against the advanced and advanced+ players, and won 55% of matches against intermediate players.  This strongly suggests our agent achieved intermediate level human play on rallies.

Qualitative Assessment

Study participants enjoyed playing with the robot, rating it highly on "fun" and "engaging".  This rating held true across different skill levels and whether the participant won or lost.  They also overwhelmingly responded "definitely yes" to wanting to play again with the robot. When given free play time with the robot they played for an average of 4:06 out of 5 minutes.

Advanced players were able to exploit weaknesses in the robot's policies, but they still had fun playing with it. In post-match interviews saw the potential for it as a more dynamic practice partner than ball throwers.

Human Strategies and Policy Weaknesses

The most skilled players mentioned that the robot was not good at handling underspin.  To test this observation we plotted the robot's landing rate by the estimated spin on the ball and indeed saw a large dropoff as more underspin was faced.  This deficiency is partially due to the difficulty of handling low balls to avoid collision with the table and secondly determining ball spin in real time. Fortunately this provides clear feedback for additional training in our flywheel.