Results
Results
The models were simulated using MatLAB modules Simulink and Reinforcement Learning Toolbox on a 12x2 grid. The model was trained for 1000 epochs. A sample output is displayed on the right side.
Reward Values
+1 for moving to a previously unexplored cell (white).
+2 for moving closer to the destination
+0.01 for getting closer to aerial base station
+0.01 for getting closer to aerial base station
-2 for moving farther from destination
-0.01 for moving farther from monitoring drones
-2 for an illegal actions
-0.05 for an action that results in movement (movement cost).
-0.1 for an action that results in no motion (lazy penalty).
-0.1 multiplied by number of previous visits for moving to a previously explored cell