Autonomous Driving Behavior through Reinforcement Leaning

by Arjun Srinivasan Ambalam , Prasheel Renkuntla and Zhiyuan Hua

Overview of the project

We learn a policy for an autonomous vehicle to maintain a lane with a max attainable speed while avoiding collisions with surrounding vehicles. We address this problem by considering a cooperative behavior among autonomous vehicles instead of ego-vehicle based behavior. We use a simulator based on Open AI gym for creating our simulation environment and train the agent using a Deep Q learning method.


When we face a multi-agent problem with multiple players interacting with each other, each agents have partially available data over the partially observed map. The reinforcement learning we need has to make long-term strategies over thousands of steps in a large action space, an action space that must be observed solely from raw input features.

With Q-learning, we define an action-value function Q(s, a) to measure how good we should take an action at particular state. For example, a Q-value function in the chess game can measure how beneficial to move the pawn forward under the condition of the game. Such move is called action-value function.

for our project, the Q measures how good to make the decision of accelerate, decelerate, or change lane at any given time and environment. As the agent observes the current state of the environment and chooses an action, the environment becomes a new state, while also returns a reward to indicate the consequence of the action chosen. Through Q-learning, the agent has a cheat sheet of Q-values in every scenarios and with its matching actions to take. However, in another word, the Q-learning agent does not have the ability to estimate value for unseen states. If the Q-learning agent has not seen a state before, it will have no clue what actions to take.


Q-learning is although very powerful, it still lacks generality as we talked about above. DQN, Deep Q Network, tries to solve this problem by using the Neural Network to replace the two-dimensional decision array used in the Q-learning.

We investigated the use of DQN to control a simulated car on a highway via reinforcement learning. We start by implementing the approach given by the simulation framework, and then experimenting with various possible alternations to build baselines as well as seek performance improvements on our selected task. Specifically, we tried to implement, reproduce, and experiment with various reward function, gradient backpropagate rules, double Q-learning, as well as other hyperparameters to induce different driving behaviors to accumulate experience and build baselines toward our final project.