4. Approximate Q-Learning

Time to Improve PacMan!

You might be wondering why it was taking so long for Pacman to learn when doing q-learning! Well, turns out that the way we represented our state made it extra difficult. To the side is an example of a state:

What you can see here is that a state represents the whole entire map, including locations of ghosts (indicated by "G"). So, the number of states is very large, especially as the size of the map grows larger!

There are too many states for the agent to be able to end up in all of them numerous times while training.

To improve this, we will now implement something called approximate q-learning, which is like a mix of perceptron and q-learning!

In approximate q-learning, we will extract features from the state (i.e., given the state, as represented above, extract things like, the number of ghosts in play, the distance to the closest ghost, the distance to the nearest power pellet, etc.), and we will use weights to learn which features are important to use to make a decision about which action to take. So, each feature that is extracted from the environment will have an associated weight. And your agent will learn these weights by using the Bellman Q-learning equation.

We won't actually have a Q-table any more. Instead, we have a perceptron that is learning to approximate a Q-value.

To do this, you will use most of your q-learning algorithm and modify just two functions. All of this will be done in pacmanQAgents.py file in the Exercises folder.

ApproximateQAgent

Inside of the file pacmanQAgents.py, you'll see two classes: PacmanQAgent and ApproximateQAgent. You will only need to implment the code for getQValue() and update() in the ApproximateQAgentClass.

Extracting Features from the State

To get the list of features from a state and action, you can call self.featExtractor.getFeatures(state, action). This will return a dictionary of all the features and the values associated with them. It will return something like this:

{'#-of-ghosts-1-step-away': 0.0, 'closest-food': 0.001428571}

getQValue(state, action)

This function will calculate the q-value of the given state-action pair.

IMPORTANT: Notice that self.weights is now a dictionary (not a list)! It is initialized to an empty dictionary. You will store one weight per feature, and access it from the dictionary by feature name.

The q-value will be calculated by the following formula:

f(s,a) represents the value of the specific feature given the state and action.

w represents the weight of the respective feature value

So in the end we are finding the sum all the values of the features multiplied by their respective weights.

2. update(state, action, nextState, reward)

Here is where you will update the weights based on rewards and punishments received.

This is where we use a modified Bellman Equation:

Here's a breakdown of the terms in this equation:

w is the weight of the feature
f(s,a) is the value associated with the feature
Q(s, a) is the current Q-value for the state-action pair.
alpha is the learning rate: use self.alpha
r is the reward
gamma is the discount factor: use self.discount
maxa' Q(s', a') is the maximum Q-value for the nextState (over all possible actions for nextState).
- Note that the PacmanQAgents class extends the qlearningAgents class which means that in PacmanQAgents.py you can call methods that are in qlearningAgents.py (for example, in PacmanQAgents.py, you could call self.computeValueFromQValues()).

Run command

Now with this we should be done! You can now run the following command and see how your Pacman does:

python3 pacman.py -p ApproximateQAgent -a extractor=SimpleExtractor -x 2000 -n 2005 -l smallClassic

If you have correctly implemented ApproximateQLearning, your PacMan should win nearly every time on smallClassic.

Different layouts

If you want to try your PacMan agent on some more difficult layouts, you can change smallClassic in the command above to any of the following layouts. Note that the more difficult the layout becomes, the longer it takes to finish a game, so the longer it takes to train. Also, you may need to increase the number of training games in order to see better results.

These are listed (roughly) in order of difficulty:

smallClassic

capsuleClassic

mediumClassic

trickyClassic

originalClassic

Page updated

Report abuse

4. Approximate Q-Learning

Time to Improve PacMan!

ApproximateQAgent

Extracting Features from the State

getQValue(state, action)

2. update(state, action, nextState, reward)