Robotic Table Tennis: A Case Study into a High Speed Learning System
David B. D'Ambrosio, Jonathan Abelian, Saminda Abeyruwan, Michael Ahn, Alex Bewley, Justin Boyd, Krzysztof Choromanski, Omar Cortes, Erwin Coumans, Tianli Ding, Wenbo Gao, Laura Graesser, Atil Iscen, Navdeep Jaitly, Deepali Jain, Juhana Kangaspunta, Satoshi Kataoka, Gus Kouretas, Yuheng Kuang, Nevena Lazic, Corey Lynch, Reza Mahjourian, Sherry Q. Moore, Thinh Nguyen, Ken Oslund, Barney J Reed, Krista Reymann, Pannag R. Sanketi, Anish Shankar, Pierre Sermanet, Vikas Sindhwani, Avi Singh, Vincent Vanhoucke, Grace Vesom, Peng Xu
Abstract
We present a deep-dive into a real-world robotic learning system that, in previous work, was shown to be capable of hundreds of table tennis rallies with a human and has the ability to precisely return the ball to desired targets. This system puts together a highly optimized perception subsystem, a high-speed low-latency robot controller, a simulation paradigm that can prevent damage in the real world and also train policies for zero-shot transfer, and automated real world environment resets that enable autonomous training and evaluation on physical robots. We complement a complete system description, including numerous design decisions that are typically not widely disseminated, with a collection of studies that clarify the importance of mitigating various sources of latency, accounting for training and deployment distribution shifts, robustness of the perception system, sensitivity to policy hyper-parameters, and choice of action space. A video demonstrating the components of the system and details of experimental results can be viewed below.
Summary Video
Lessons Learned
Choosing the right robots is important. The system started with a scaled down version of the current setup as a proof of concept and then graduated to full-scale, industrial robots. Industrial robots have many benefits such as low latency and high repeatability, but they can come with "closed-box" issues that must be worked through.
A safety simulator is a dynamic and customizable solution to constraining operations with high frequency control compared to high-level trajectory planners.
Accurate environmental perception is also a key factor in transfer performance. Many factors were non-obvious to non-vision experts: camera placement, special calibration techniques, lens locks, etc. In this system's case, all optical factors where considered for improved 3D ball detection.
A purpose-built learnable perception module with high-speed, high-precision inference was designed to track ball in play at 125 Hz. This level of performance required a custom architecture with several efficiency considerations including: GPU data buffering, raw Bayer pattern input and optimized with hard patch mining.
Interpolating and smoothing inputs solves the problem of different devices running at different frequencies. It also guards against zero-mean noise and system latency variability, but is less effective against other types of noise.
Automatic resets and remote control increase system utilization and research velocity. The system originally required a human to manually collect balls and control the thrower. Now that the system can be run remotely and "indefinitely'', significantly more data collection and training can occur.
Evolutionary strategies (ES) algorithms like Blackbox Gradient Search (BGS) are a good starting point to explore the capabilities of a system, but they may also be a good option in general. BGS is still the most successful and reliable method applied in this system. Despite poor sample efficiency, ES methods are simple to implement, scalable, and robust optimizers that can even fine-tune real world performance.
Latency modeling is critical for real world transfer performance as indicated by our experimental results. Other environmental factors may have varying effects that change based on the task. For example, ball spin is not accurately modeled in the ball return task, but can be critical when more nuanced actions are required.
A configurable, modular, and multi-language (e.g. C++ and Python) system improves research and development velocity by making experimentation and testing easy for the researcher.
System Overview
The diagram on the left shows how the various software components fit to form the environment: in simulation, everything runs in a single process, but the real environment splits the work among several. The diagram on the right shows the components of the real hardware system. A custom MPI manages communication between the parts and logging of all data.
Robots
The player in this system consists of two industrial robots that work together: an ABB 6DOF arm and a Festo 2DOF linear actuator, creating an 8DOF system. The two robots complement each other: the gantry is able to cover large distances quickly, maneuvering the arm into an appropriate position where it can make fine adjustments and hit the ball in a controlled manner with the arm. The choice of industrial robots was deliberate, to focus on the machine learning challenges of the problem and for high reliability. An off-the-shelf ball thrower was customized to make it more robust and allow for automation and feedback.
ABB Arm
Festo Gantry
Custom Ball Thrower
Perception System
Table tennis is a highly dynamic sport (an amateur-speed ball crosses the table in 0.4 seconds), requiring extremely fast reaction times and precise motor control when hitting the ball. Therefore a vision system with the desiderata of low latency and high precision is required. It is also not possible to instrument (e.g. with LEDs) or paint the ball for active tracking as they are very sensitive to variation in weight or texture and so a passive vision system must be employed.
A custom vision pipeline that is fast, accurate and passive is designed to provide 3D balls positions and contained within a single model. It consists of three main components 1) 2D ball detection across two stereo cameras, 2) triangulation to recover the 3D ball position and 3) a sequential decision making process which manages trajectory creation, filtering, and termination. All stages are jointly tuned to maximize the Average Local Tracking Accuracy metric. This system employs several novel techniques to optimize speed and performance including inference directly from Bayer images and novel temporal convolutions (both demonstrated below).
Simulation
The table tennis environment is simulated to facilitate sim-to-real training and prototyping for real robot training. PyBullet is the physics engine and the environment interface conforms to the Gym API. There are five conceptual components; (1) the physics simulation and ball dynamics model which together model the dynamics of the robot and ball, (2) the StateMachine which uses ball contact information from the physics simulation and tracks the semantic state of the game (e.g. the ball just bounced on the opponent's side of the table, the player hit the ball), (3) the RewardManager which loads a configurable set of rewards and outputs the reward per step, (4) the DoneManager which loads a configurable set of done conditions (e.g. ball leaves play area, robot collision with non-ball object) and outputs if the episode is done per step, and (5) the Observation class which configurably formats the environment observation per step.
Ball Retrieval System
An important aspect of a real world robotic system is environment reset. If each episode requires a lengthy reset process or human intervention, then progress will be slow. In addition to the customized thrower, a system to automate the refill process was designed that exploits the light weight of table tennis balls by blowing air to return them to the hopper. A ceiling-mounted fan blows down to remove balls stuck on the table, which is surrounded by foamcore to direct the balls into carpeted pathways. At each corner of the path is a blower fan (typically meant for drying out carpet) that directs air across the floor. The balls circulate around the table until they reach a ramp that directs them to a tube that also uses air to transport them back into the hopper.
System Studies
Perception Resilience
For these experiments we modulate vision performance in the following ways: (1) reduce the frame-rate (FPS) of the cameras , (2) increase latency by queuing observations and sending them to the policy at fixed intervals, and (3) reduce accuracy by injecting zero mean and non-zero mean noise to the ball position (over and above inherent noise in the system).
Reducing FPS and increasing latency have threshold points where performance of the system is stable until it reaches a point where the robot can no longer react the to ball in time. Additional noise causes graceful degradation in performance, increased by non-zero mean distributions (common in vision triangulation). The interpolation of observations described in Section II-E likely serves as a buffer against low levels of zero mean noise.
Simulated Parameters
In these experiments we assess the sensitivity of zero-shot real world policy performance to select simulated environment parameters. We highlight some key findings below.
Modeling latency is crucial for good performance: The figure on the left shows that policies are sensitive to latency. The baseline model (i.e. the model that uses latency values as measured on hardware) had a significantly higher zero-shot transfer than any of the other latency values tested.
Anchoring ball distributions to the real world matters, but precision is not essential: The figure on the right indicates that policies are robust to variations in ball distributions provided the real world distribution (thrower) is contained within the training distribution. For example, the medium and wide distributions were derived from the baseline distribution but are 25% and 100% larger respectively (see Appendix H). In contrast, very small training distributions (tiny) or distributions which are disjoint from the baseline distribution in one or more components (velocity offset) result in performance degradation.
Debugging Interface
Due to the high speed of the robot and the ball, a robust debugging interface was necessary to diagnose any issues with the system and to test out new features and optimizations.
Catching Application
The table tennis system was also applied to an agile catching task.