Formulated as an MDP, the task is to guard danger zones using cameras (O) so that if an intruder (▶) moves to a danger zone, at least one camera is pointing at that location. The episode is finished after 1000 steps. The initial grids of cameras and intruders are highlighted with the same color code [code]
RLPy is a framework to conduct sequential decision making experiments with the focus on value-function-based reinforcement learning on basic alorithms. The project is distributed under the 3-Clause BSD License. [code]