Task 3:

Imitation Learning

Imitation learning, also called behavioral cloning, is a type of machine learning where an agent learns to perform a task by mimicking the actions of an expert or by following a set of demonstrations. Instead of learning by trial and error, as in reinforcement learning, the agent learns directly from examples provided by a human or another already-trained agent.

We will now use a perceptron classifier to perform imitation learning on the classic 80’s video game, PacMan!

In the game of PacMan, the agent has 5 options of what to do at any point: move up, move down, move left, move right, or do not move.

We will call these 5 options North, South, West, East, and Stop.

Just as we had to manipulate the images of handwritten digits into features (a list of 0’s and 1’s representing pixel values, in that case), we also need a way to represent the features, which is the state of the PacMan game, to our perceptron. We can use features such as: how close is the nearest ghost; where are the food (dots) that are left to be eaten; how many power pellets are left; etc.

For the training data, we want our ML agent to learn from examples, so our training data comes from recording games that humans have played.

Each training record contains the state of the game at a snapshot in time - so it contains all of the information mentioned above (food, power pellets, ghosts, etc.), along with a label, which is what action the human player chose in that situation - i.e. North, South, West, East, or Stop.

The objective of your perceptron agent is, given some game state, learn to predict the correct action to take. This is a multi-class classification problem: input the game state (food, pellets, ghosts, etc.) and predict the best action to take (North, South, West, East, or Stop).

However, because of the way the training data has been recorded, we are once again going to need to change the structure of our perceptron to suit the nature of this problem & data.

Setup

Instead of having numerous perceptrons, one per class label, as we did for the previous multi-class problem, we are going to go back to a single perceptron, but each of the potential labels will have different input feature values.

We will once again not use a step function on our perceptron.

What we will do is: pass the feature values for 'North' through the perceptron and compute the output value. Then we will pass the feature values for 'South' through the perceptron and compute the output value. Then 'East', 'West', and 'Stop'. Whichever potential action gives us the highest output value is the one we will choose.

Pause here. Make sure you understand this setup. Turn your cups to red if there are any questions!

Adjusting Weights

As always, if the predicted action matches the correct action, then we do not adjust any weights. But if the prediction does not match the correct action, then that means the predicted action gave us a higher value than the correct action, therefore we want to reduce the weights based on the predicted action, and increase the weights based on the correct action. Since we are using only one perceptron, with one set of weights, we will adjust the one set of weights, based on both sets of feature values:

wi = wi + [feature values of the correct action]i

wi = wi − [feature values of the predicted action]i

You will code your pacman-playing perceptron in the file perceptron_pacman.py in the Exercises folder.

Just as before, there are two functions in implement here: classify() and train().

Training Data

The training data is structured a bit differently for this task than it is for the others, so let's take a minute to understand it.

You'll notice in the classify() function, there is a call to self.convert_data(data) which returns 2 things: features and legal_moves. (Do not edit nor remove the call to this function - it is necessary for things to work correctly.)

legal_moves is a list of moves that are legal from the current game state. For example, if PacMan is in the position shown below, then moving North or South are not legal moves because of the walls. So the list of legal_moves in this game state would be [East, West, Stop].

What features holds is a list of feature values for each legal move. So, if the legal_moves are [East, West, Stop], then features is holding the feature values for each of those options - it is holding 3 lists of feature values, in this case - and you can pull out each one individually by doing:

features['East']

This will get you the list of feature values for the 'East' option.

Important: Inside of the train() function, you will need to call self.convert_data() on each training record individually, as you loop over them. Then you can get the feature values you need, just as above, by doing: features['East']

In perceptron_pacman.py in the Exercises folder, implement: classify() and train().

Run your code and watch it play PacMan with the following command:

python3 pacman.py -p ClassifierAgent

:)

P.S.

If you want to play PacMan yourself with the arrow keys, run:

python3 pacman.py

Page updated

Report abuse