Project Status Done
Project Type University / Solo
Project Duration ~ 6 months
Software Used Unity
Languages Used Python, C#
About :
This project formed an integral component of my thesis course at the University of Peloponnese, focusing on training an agent within a video game environment to address single-source shortest path challenges.
Objectives:
The primary aim was to develop a specialized reward function (RF) tailored to the specific task requirements while ensuring the reinforcement learning (RL) agent was cognizant of the underlying rules governing the environment. For this research two different RFs will be used, the Simple RF and the Advanced RF.
Methodology:
The project entailed the integration of a graph system with Dijkstra's algorithm to construct the requisite data and environment. This fusion facilitated the creation of a dynamic and versatile platform for training the RL agent. The training of the agent is divided into two training categories(S for Simple and A for Advanced), each consisting of four phases, labeled as A, B, C, and culminating in the final phase, D. In the end, I will compare these training categories and discuss the relative results they will return to us when applied to stable training environments, so that the agent can be trained in the same environment and we can have precise data to compare.
The complete project is available at: https://github.com/ChristosKrilisDev/ml-agents-thesis-project
Mark: 10/10
Simple Training, Cumulative Reward
Simple Training, Episode Length
Simple Training, Shortest Path Success Rate
Advanced Training, Cumulative Reward
Advanced Training, Episode Length
Advanced Training, Shortest Path Success Rate
SHAPED REWARD FUNCTION - PSEUDOCODE
START
Initially:
e ∈ [0.1,1], the gradient of rewards
s ∈ [1,0], the steps executed by the agent
L ∈ [0,1], the distance difference between agent and target
While the episode has not finished OR no Terminal State has been activated:
Update the value of steps executed by the agent.
If the episode has NOT finished
Then calculate the distance reward: reward = (1 - current distance / L)^e * s
Return the reward.
If the First Target is found Then reward = 0.25
If the Final Target is found Then reward = 0.5
If the vector distance is equal to the estimated
Then reward = 1 Return reward
Else Return reward = -0.25
Else Return reward = -0.5
Else Return reward = -1
End While
END
Results
The advanced models exhibit approximately a 45% increase in success rate, rendering them twice as efficient as models utilizing the basic RF.
More Notable Contributions :
Implemented a state machine training controller responsible for determining the training and reward signals received by the RL agent.
Developed an algorithm to randomize the environment and graph by shuffling nodes and automatically establishing connections with neighboring nodes.
Created supportive tools for training purposes:
A visualization tool depicting the graph and the shortest route suggested by Dijkstra's algorithm.
A logging tool providing comprehensive training information, including steps, target nodes, path length, rewards, and more.
Dijkstra Visualization Tool
Agent Finding Shortest Path
Multi-environment training