Solving Single Source Shortest Path Problems Using RL in Video Games

Project Status Done

Project Type University / Solo

Project Duration ~ 6 months

Software Used Unity

Languages Used Python, C#

About :

This project formed an integral component of my thesis course at the University of Peloponnese, focusing on training an agent within a video game environment to address single-source shortest path challenges.

Objectives:

The primary aim was to develop a specialized reward function (RF) tailored to the specific task requirements while ensuring the reinforcement learning (RL) agent was cognizant of the underlying rules governing the environment. For this research two different RFs will be used, the Simple RF and the Advanced RF.

Methodology:

The project entailed the integration of a graph system with Dijkstra's algorithm to construct the requisite data and environment. This fusion facilitated the creation of a dynamic and versatile platform for training the RL agent. The training of the agent is divided into two training categories(S for Simple and A for Advanced), each consisting of four phases, labeled as A, B, C, and culminating in the final phase, D. In the end, I will compare these training categories and discuss the relative results they will return to us when applied to stable training environments, so that the agent can be trained in the same environment and we can have precise data to compare.

The complete project is available at: https://github.com/ChristosKrilisDev/ml-agents-thesis-project

Mark: 10/10

Simple Training, Cumulative Reward

Simple Training, Episode Length

Simple Training, Shortest Path Success Rate

Advanced Training, Cumulative Reward

Advanced Training, Episode Length

Advanced Training, Shortest Path Success Rate

SHAPED REWARD FUNCTION - PSEUDOCODE

START

Initially:

e ∈ [0.1,1], the gradient of rewards

s ∈ [1,0], the steps executed by the agent

L ∈ [0,1], the distance difference between agent and target

While the episode has not finished OR no Terminal State has been activated:

Update the value of steps executed by the agent.

If the episode has NOT finished

Then calculate the distance reward: reward = (1 - current distance / L)^e * s

Return the reward.

If the First Target is found Then reward = 0.25

If the Final Target is found Then reward = 0.5

If the vector distance is equal to the estimated

Then reward = 1 Return reward

Else Return reward = -0.25

Else Return reward = -0.5

Else Return reward = -1

End While

END

Results

The advanced models exhibit approximately a 45% increase in success rate, rendering them twice as efficient as models utilizing the basic RF.

More Notable Contributions :

Implemented a state machine training controller responsible for determining the training and reward signals received by the RL agent.
Developed an algorithm to randomize the environment and graph by shuffling nodes and automatically establishing connections with neighboring nodes.
Created supportive tools for training purposes:
- A visualization tool depicting the graph and the shortest route suggested by Dijkstra's algorithm.
- A logging tool providing comprehensive training information, including steps, target nodes, path length, rewards, and more.

Dijkstra Visualization Tool

Agent Finding Shortest Path

Multi-environment training

Page updated

Google Sites

Report abuse

Solving Single Source Shortest Path Problems Using RL in Video Games

Contact Me