Real-Time Algorithms for Game-Theoretic Motion Planning and Control in Autonomous Racing using Near-Potential Function
Dvij Kalaria* Chinmay Maheshwari* Shankar Sastry
* Equal contribution
UC Berkeley
Learning for Dynamics and Control (L4DC) 2025
Dvij Kalaria* Chinmay Maheshwari* Shankar Sastry
* Equal contribution
UC Berkeley
Learning for Dynamics and Control (L4DC) 2025
Abstract
Autonomous racing extends beyond the challenge of controlling a racecar at its physical limits. Professional racers employ strategic maneuvers to outwit other competing opponents to secure victory. While modern control algorithms can achieve human-level performance by computing offline racing lines for single-car scenarios, research on real-time algorithms for multi-car autonomous racing is limited. To bridge this gap, we develop game-theoretic modeling framework that incorporates the competitive aspect of autonomous racing like overtaking and blocking through a novel policy parametrization, while operating the car at its limit. Furthermore, we propose an algorithmic approach to compute the (approximate) Nash equilibrium strategy, which represents the optimal approach in the presence of competing agents. Specifically, we introduce an algorithm inspired by recently introduced framework of dynamic near-potential function, enabling real-time computation of the Nash equilibrium. Our approach comprises two phases: offline and online. During the offline phase, we use simulated racing data to learn a near-potential function that approximates utility changes for agents. This function facilitates the online computation of approximate Nash equilibria by maximizing its value. We evaluate our method in a head-to-head 3-car racing scenario, demonstrating superior performance compared to several existing baselines.
Multi-car racing results (3 agents)
Race track
Table: Outcome of 99 races conducted such that starting positions are selected randomly from region 1,2 and 3. Ego agent uses our approach and O1, O2 opponent agents use the baseline approach as in each row. The starting order of agents is Ego>O1>O2, O1>Ego>O2, O1>O2> Ego for 33 races each
Note: The numbers are slightly different from those reported in submitted paper as the policy parameterization was improved after submitting the paper. The results will be updated with these in the final version. Nevertheless, this does not affect the conclusion in any way
Some videos of races
Note: The videos are recorded in foxglove studio. The transforms are published through ROS2 based on which foxglove renders a car object there separately. Because of this the car positions may not be completely synced as sometimes it may have take time for the 3d object to load. Hence, sometimes the positions in foxglove videos maybe inconsistent. Please refer to the Unity third person videos also provided for each race which should be synced and accurate.
Note: Unity game engine is only used for rendering here, physics engine of unity is not used here. The physics model followed by the vehicles is the same as described by Eqn (5) in supplementary. Sound effects and skid marks are added based on throttle commands and the slip angles at the tires calculated as per Appendix D
Note: All foxglove videos are in 2X while all the unity videos are in 1X (real-time)
Ours vs MPC (with random params) vs MPC (with random params)
Note: As can be seen in this race, the opponents using fixed randomly chosen parameters at the beginning are incompetent to our approach continuously optimizing for the near-nash equilibrium parameters. We are able to win almost all races as possible with opponent agents using fixed random policy parameters throughout the race
Ours vs MPC (with random params) vs MPC (with random params) rendered in Unity
Ours vs Ours with high discount factor vs Ours with high discount factor
Note: As can be seen in this race, with very high discount factor, the opponent agents look for long term gains debarring them from taking risks to win the game. The ego agent with lower discount factor takes a clear advantage here enabling it to take risks. Also, the learned potential function for higher discount factor may not be very accurate due to more complexity associated with longer horizon
Ours vs Ours with high discount factor vs Ours with high discount factor rendered in Unity
Ours vs Ours with low discount factor vs Ours with low discount factor
Note: As can be seen in this race, with very low discount factor, the opponent agents greedily look for short term gains and tend to be myopic. Due to this, they take higher risks to get more progress on straights but later suffer at the turns. The ego agent with higher discount factor takes a clear advantage here by playing safe and with a longer foresight
Ours vs Ours with low discount factor vs Ours with low discount factor rendered in Unity
Ours vs Ours with low training data vs Ours with low training data
Note: As can be seen in this race, with low training data, the opponent has a more inaccurate model of potential function. Due to this, they take more irrational policy parameter choices than the ego agent trained with more data. The ego agent with more training takes a clear advantage here by using a more accurate potential function model
Ours vs Ours with low training data vs Ours with low training data rendered in Unity
Ours vs Self-play RL vs Self-play RL
Note: As can be seen in this race, self-play RL fails to learn a near-optimal policy which is easily beaten by the Ego agent using our approach. However, it is important to note here that we do not claim superiority to self-play RL here but rather propose a more augmentative approach that can be used to improve the efficacy and convergence of self-play RL.
Ours vs Self-play RL vs Self-play RL rendered in Unity
Ours vs IBR vs IBR
Note: As can be seen in this race, opponents using Iterated Best Response (IBR) tend to be very myopic as they look for nash equilibrium in a limited horizon (This is required to be deployable in real-time). Due to this, the agents tend to look for short term gains by speeding on straights but later suffer at the turns. However, it is important to note here that our IBR implementation may not be the best one but the overall idea and design is the same as proposed in representative works.
Ours vs IBR vs IBR rendered in Unity
Figure: Potential values and the trajectories at a given joint state for different (a) q (b) α (c) s1 (d) s2 (e) s3 of only the ego agent. We only denote 2 players here (only 1 player for (a) and (b)) and the 3rd player is far away from this position to not affect any players. Additionally, for ease of readability, we only show the impact of variation in trajectory of other player in response to ego in (e) as such deviations are not significant in (c) and (d)
Head-to-Head racing results (2 agents)
Table: Outcome of 100 races conducted such that starting positions are selected randomly from region 1 and 2. Ego agent uses our approach and Opp agent uses the baseline approach as in each row. The starting order of agents is Ego>Opp, Opp>Ego for 50 races each