VolleyBots: A Testbed for Multi-Drone Volleyball Game Combining Motion Control and Strategic Play

Anonymous Authors

Robot sports, characterized by well-defined objectives, explicit rules, and dynamic interactions, present ideal scenarios for demonstrating embodied intelligence. In this paper, we present VolleyBots, a novel robot sports testbed where multiple drones cooperate and compete in the sport of volleyball under physical dynamics. VolleyBots integrates three features within a unified platform: competitive and cooperative gameplay, turn-based interaction structure, and agile 3D maneuvering.

Competitive and cooperative gameplay challenges each drone to coordinate with its teammates while anticipating and countering opposing teams’ tactics. Turn-based interaction demands precise timing, accurate state prediction, and management of long-horizon temporal dependencies. Agile 3D maneuvering requires rapid accelerations, sharp turns, and precise 3D positioning despite the quadrotor’s underactuated dynamics. These intertwined features yield a complex problem combining motion control and strategic play, with no available expert demonstrations.

We provide a comprehensive suite of tasks ranging from single-drone drills to multi-drone cooperative and competitive tasks, accompanied by baseline evaluations of representative multi-agent reinforcement learning (MARL) and game-theoretic algorithms. Simulation results show that on-policy reinforcement learning (RL) methods outperform off-policy methods in single-agent tasks, but both approaches struggle in complex tasks that combine motion control and strategic play. We additionally design a hierarchical policy that achieves a 69.5% win rate against the strongest baseline in the 3 vs 3 task, underscoring its potential as an effective solution for tackling the complex interplay between low-level control and high-level strategy.

Code

Real-world Demonstration

Real-world Demonstration

Real-world Demo

120 times

154 times

We successfully achieve 100+ times on solo bump task using our framework!

We use the Solo Bump task as a demonstration of the policy’s ability to zero-shot transfer to the real world, in which a remote-controlled claw releases the ball right above the drone at the beginning of each experiment. To simulate real-world noise and imperfect execution of actions, small randomizations are introduced in the ball’s initial position, and coefficient of restitution.

The policy uses CTBR as output and is deployed on the onboard Nvidia Jetson Orin NX processor. Experiment results show that the drone successfully performs bump tasks multiple times, providing initial evidence of sim-to-real transfer capability. More experiments are coming soon.

Settings

We use a drone with a rigidly mounted badminton racket and a ball wrapped with retro-reflective tape, as shown in Fig.1. The radius of the badminton racket is 11 cm. The center of the racket is 8 cm from the COG(center of gravity) and on the paddle plane’s normal vector.

Detailed configuration of the drone is shown in Table 1. The state of both the drone and the ball is captured using a motion capture system.

Fig. 1 The drone and ball in the VollyBots testbed.

Table 1 Configuration and parameters of the drone and ball in the testbed.

Introduction

Fig. 2 Overview of the VolleyBots Testbed. VolleyBots comprises four key components: (1) Environment, supported by Isaac Sim and PyTorch, which defines entities, observations, actions, and reward functions; (2) Tasks, including 3 single-agent tasks, 3 multi-agent cooperative tasks, and 2 multi-agent competitive tasks; (3) Algorithms, encompassing RL, MARL, and game-theoretic algorithms.

The overview of the VolleyBots testbed is shown in Fig. 2, while the main contributions of this work are summarized as follows:

We introduce VolleyBots, a novel robot sports environment centered on drone volleyball, featuring mixed competitive and cooperative game dynamics, turn-based interactions, and agile 3D maneuvering while demanding both motion control and strategic play.
We release a curriculum of tasks, ranging from single-drone drills to multi-drone cooperative plays and competitive matchups, and baseline evaluations of representative MARL and game-theoretic algorithms, facilitating reproducible research and comparative assessments.
We design a hierarchical policy that achieves a 69.5% win rate against the strongest baseline in the 3 vs 3 task, offering a promising solution for tackling the complex interplay between low-level control and high-level strategy.

Tasks

Inspired by the way humans progressively learn to play volleyball, we introduce a series of tasks that systematically assess both low-level motion control and high-level strategic play, as shown in Fig. 3.

Fig. 3 Proposed tasks in the VolleyBots testbed, inspired by the process of human learning in volleyball. Single-agent tasks evaluate low-level control, while multi-agent cooperative and competitive tasks integrate high-level decision-making with low-level control.

Single-Agent Tasks:

Back and Forth: The drone sprints between two designated points to complete as many round trips as possible within the time limit.
Hit the Ball: The ball is initialized directly above the drone, and the drone hits the ball once to make it land as far as possible.
Solo Bump: The ball is initialized directly above the drone, and the drone bumps the ball in place to a specific height as many times as possible within the time limit.

Multi-Agent Cooperation:

Bump and Pass: Two drones work together to bump and pass the ball to each other back and forth as many times as possible within the time limit.
Set and Spike (Easy): Two drones take on the role of a setter and an attacker. The setter passes the ball to the attacker, and the attacker then spikes the ball downward to the target region on the opposing side.
Set and Spike (Hard): Similar to Set and Spike (Easy) task, two drones act as a setter and an attacker to set and spike the ball to the opposing side. The difference is that there is a rule-based defense board on the opposing side to intercept the attacker's spike.

Multi-Agent Competition:

1 vs 1: One drone on each side competes against the other in a volleyball match and wins by hitting the ball in the opponent's court. When the ball is on its side, the drone is allowed only one hit to return the ball to the opponent's court.
3 vs 3: Three drones on each side form a team to compete against the other team in a volleyball match. The drones in the same team cooperate to serve, pass, spike, and defend within the standard rule of three hits per side.
6 vs 6: Six drones per side form teams on a full-size court under the standard three-hits-per-side rule of real-world volleyball.

Experiments

Benchmark Results

Single-Agent Tasks:

We evaluate two RL algorithms including Deterministic Policy Gradient (DDPG) and Proximal Policy Optimization (PPO) in three single-agent tasks. We also compare their performance under different action spaces including CTBR and PRT. The averaged results over three seeds are shown in Table 2.

Table 2 Benchmark result of single-agent tasks with different action spaces, including Collective Thrust and Body Rates (CTBR) and Per-Rotor Thrust (PRT). Back and Forth is evaluated by the number of round trips, Hit the Ball is evaluated by the hitting distance, and Solo Bump is evaluated by the number of bumps achieving a certain height.

Multi-Agent Cooperation:

We evaluate four MARL algorithms including Multi-Agent DDPG (MADDPG), Multi-Agent PPO (MAPPO), Heterogeneous-Agent PPO (HAPPO), Multi-Agent Transformer (MAT) in three multi-agent cooperative tasks. We also compare their performance with and without reward shaping. The averaged results over three seeds are shown in Table 3.

Table 3 Benchmark result of multi-agent cooperative tasks with different reward settings including without and with shaping reward. Bump and Pass is evaluated by the number of bumps, Set the Spike (Easy) and Set the Spike (Hard) are evaluated by the success rate.

Multi-Agent Competition:

We evaluate four game-theoretic algorithms including Self-play (SP), Fictitious Self-Play (FSP), Policy-Space Response Oracles (PSRO) with a uniform meta-solver (PSRO_Uniform), and a Nash meta-solver (PSRO_Nash) in multi-agent competitive tasks. We evaluate their performance using approximate exploitability, the average win rate against other learned policies, and Elo rating. The averaged results over three seeds are shown in Table 4.

Table 4 Benchmark result of multi-agent competitive tasks including 1 vs 1 and 3 vs 3 with different evaluation metrics.

Demos of tasks

Single-Agent Tasks (Policies are trained by PPO)

Back and Forth

Hit the Ball

Solo Bump

Multi-Agent Cooperative Tasks (Policies are trained by MAPPO)

Bump and Pass

Set and Spike (easy)

Set and Spike (hard)

Multi-Agent Competitive Tasks (1v1 policy is trained by FSP, 3v3 policy is trained by SP, 6v6 policy is trained by PSRONash)

1 vs 1

3 vs 3

6 vs 6

Hierarchical Policies

In the 3 vs 3 task, algorithms learned from scratch exhibit minimal progress, such as learning to serve the ball, but fail to produce other strategic behaviors. We further investigate hierarchical policies as a promising approach.

We first employ the PPO algorithm to develop a set of low-level skill policies, including Hover, Serve, Pass, Set, and Attack. Next, we design a rule-based high-level strategic policy to assign low-level skills to each drone. We evaluate the average win rate of 1000 episodes where the hierarchical policy competes against the SP policy. The results show that the hierarchical policy achieves a significantly higher win rate of 86%.

Serve Scenario

Rally Scenario

Page updated

Google Sites

Report abuse