Real-Time Algorithms for Game-Theoretic Motion Planning and Control in Autonomous Racing using Near-Potential Function

Dvij Kalaria* Chinmay Maheshwari* Shankar Sastry

* Equal contribution

UC Berkeley

Learning for Dynamics and Control (L4DC) 2025

Supplementary

Paper

Code (Coming soon)

Video (Coming soon)

Abstract

Autonomous racing extends beyond the challenge of controlling a racecar at its physical limits. Professional racers employ strategic maneuvers to outwit other competing opponents to secure victory. While modern control algorithms can achieve human-level performance by computing offline racing lines for single-car scenarios, research on real-time algorithms for multi-car autonomous racing is limited. To bridge this gap, we develop game-theoretic modeling framework that incorporates the competitive aspect of autonomous racing like overtaking and blocking through a novel policy parametrization, while operating the car at its limit. Furthermore, we propose an algorithmic approach to compute the (approximate) Nash equilibrium strategy, which represents the optimal approach in the presence of competing agents. Specifically, we introduce an algorithm inspired by recently introduced framework of dynamic near-potential function, enabling real-time computation of the Nash equilibrium. Our approach comprises two phases: offline and online. During the offline phase, we use simulated racing data to learn a near-potential function that approximates utility changes for agents. This function facilitates the online computation of approximate Nash equilibria by maximizing its value. We evaluate our method in a head-to-head 3-car racing scenario, demonstrating superior performance compared to several existing baselines.

Multi-car racing results (3 agents)

Race track

Table: Outcome of 99 races conducted such that starting positions are selected randomly from region 1,2 and 3. Ego agent uses our approach and O1, O2 opponent agents use the baseline approach as in each row. The starting order of agents is Ego>O1>O2, O1>Ego>O2, O1>O2> Ego for 33 races each

Note: The numbers are slightly different from those reported in submitted paper as the policy parameterization was improved after submitting the paper. The results will be updated with these in the final version. Nevertheless, this does not affect the conclusion in any way

Some videos of races

Note: The videos are recorded in foxglove studio. The transforms are published through ROS2 based on which foxglove renders a car object there separately. Because of this the car positions may not be completely synced as sometimes it may have take time for the 3d object to load. Hence, sometimes the positions in foxglove videos maybe inconsistent. Please refer to the Unity third person videos also provided for each race which should be synced and accurate.

Note: Unity game engine is only used for rendering here, physics engine of unity is not used here. The physics model followed by the vehicles is the same as described by Eqn (5) in supplementary. Sound effects and skid marks are added based on throttle commands and the slip angles at the tires calculated as per Appendix D

Note: All foxglove videos are in 2X while all the unity videos are in 1X (real-time)

Ours vs MPC (with random params) vs MPC (with random params)

Note: As can be seen in this race, the opponents using fixed randomly chosen parameters at the beginning are incompetent to our approach continuously optimizing for the near-nash equilibrium parameters. We are able to win almost all races as possible with opponent agents using fixed random policy parameters throughout the race

Ours vs MPC (with random params) vs MPC (with random params) rendered in Unity

Ours vs Ours with high discount factor vs Ours with high discount factor

Note: As can be seen in this race, with very high discount factor, the opponent agents look for long term gains debarring them from taking risks to win the game. The ego agent with lower discount factor takes a clear advantage here enabling it to take risks. Also, the learned potential function for higher discount factor may not be very accurate due to more complexity associated with longer horizon

Ours vs Ours with high discount factor vs Ours with high discount factor rendered in Unity

Ours vs Ours with low discount factor vs Ours with low discount factor

Note: As can be seen in this race, with very low discount factor, the opponent agents greedily look for short term gains and tend to be myopic. Due to this, they take higher risks to get more progress on straights but later suffer at the turns. The ego agent with higher discount factor takes a clear advantage here by playing safe and with a longer foresight

Ours vs Ours with low discount factor vs Ours with low discount factor rendered in Unity

Ours vs Ours with low training data vs Ours with low training data

Note: As can be seen in this race, with low training data, the opponent has a more inaccurate model of potential function. Due to this, they take more irrational policy parameter choices than the ego agent trained with more data. The ego agent with more training takes a clear advantage here by using a more accurate potential function model

Ours vs Ours with low training data vs Ours with low training data rendered in Unity

Ours vs Self-play RL vs Self-play RL

Note: As can be seen in this race, self-play RL fails to learn a near-optimal policy which is easily beaten by the Ego agent using our approach. However, it is important to note here that we do not claim superiority to self-play RL here but rather propose a more augmentative approach that can be used to improve the efficacy and convergence of self-play RL.

Ours vs Self-play RL vs Self-play RL rendered in Unity

Ours vs IBR vs IBR

Note: As can be seen in this race, opponents using Iterated Best Response (IBR) tend to be very myopic as they look for nash equilibrium in a limited horizon (This is required to be deployable in real-time). Due to this, the agents tend to look for short term gains by speeding on straights but later suffer at the turns. However, it is important to note here that our IBR implementation may not be the best one but the overall idea and design is the same as proposed in representative works.

Ours vs IBR vs IBR rendered in Unity

Figure: Potential values and the trajectories at a given joint state for different (a) q (b) α (c) s1 (d) s2 (e) s3 of only the ego agent. We only denote 2 players here (only 1 player for (a) and (b)) and the 3rd player is far away from this position to not affect any players. Additionally, for ease of readability, we only show the impact of variation in trajectory of other player in response to ego in (e) as such deviations are not significant in (c) and (d)

Head-to-Head racing results (2 agents)

Table: Outcome of 100 races conducted such that starting positions are selected randomly from region 1 and 2. Ego agent uses our approach and Opp agent uses the baseline approach as in each row. The starting order of agents is Ego>Opp, Opp>Ego for 50 races each

Page updated

Google Sites

Report abuse