Formulated as an MDP, the task is to guard danger zones using cameras (O) so that if an intruder (▶) moves to a danger zone, at least one camera is pointing at that location. The episode is finished after 1000 steps. The initial grids of cameras and intruders are highlighted with the same color code [code]

RLPy is a framework to conduct sequential decision making experiments with the focus on value-function-based reinforcement learning on basic alorithms. The project is distributed under the 3-Clause BSD License. [code]

I modified the so the total time of the program is shown as a node on your output file. This only works with the pstats loading the file from cProfile output. [code]