Your agent will play PacMan in two phases. In the first phase, training, your agent will learn the values of various positions and actions. Because it takes a very long time to learn accurate Q-values even for tiny grids, the training games run in "quiet mode", with no display. Your agent will play 2000 training games.
Once the training phase is complete, your agent will enter testing mode. When testing, your PacMan's self.epsilon and self.alpha will be set to 0.0, effectively stopping learning and disabling exploration, in order to allow PacMan to exploit his learned policy. There will be 5 test games. Test games are shown so you can watch your PacMan play!
Without any changes to your code, you should be able to run Q-learning for PacMan using:
python3 pacman.py -p PacmanQAgent -x 2000 -n 2005 -l smallGrid
*******
If you have correctly implemented q-learning, your PacMan should win nearly every time on smallGrid.
If your agent is NOT winning regularly on smallGrid, go back and debug your algorithm!
*******
Once your q-learning implementation is working:
Different layouts
If you want to try your PacMan agent on a more difficult layout, you can change smallGrid to smallClassic in the run command above.
Training longer
If you want to change the number of training games, change the 2000 after the -x in the command above, then you also need to change the number after the -n to [number of training games]+5
Your PacMan may not perform as well on smallClassic. We will work on improving your PacMan next...