Homework 5 (with answers)

Note: the SARSA version of Q-learning compares two consecutive state/action pairs. So the Q(s',a') refers to the second state and action; if we are in the terminal state it will always be 100.

R(s) is correct for this question and should not be R(s') as in other formulations of Q-learning.

Discuss this question on aiqus. When posting use the tag 'hw5-1'

Homework 5 2 Function Generalization.mp4

In this question, when = < is used, the < is used like the opening of a set or list. You can think of it like = { .

Discuss this question on aiqus. When posting use the tag 'hw5-2'

Homework 5 3 Passive RL Agent.mp4

The actions are move North, West, East, and South. All actions are stochastic; 80% they move as intended, and 10% they might move 90 degrees right or left. The part of the policy c) of moving back immediately means on the next turn take an action that (if it goes in the intended direction) brings the agent back to the grey square that is closest to its position, and if there are several of those, closest to the goal. If there are two road squares equally close to the goal, head North.

Discuss this question on aiqus. When posting use the tag 'hw5-3'

Answers:

Homework 5 1 Q Learning ANSWER.mp4

Note: the SARSA version of Q-learning compares two consecutive state/action pairs. So the Q(s',a') refers to the second state and action; if we are in the terminal state it will always be 100.

R(s) is correct for this question and should not be R(s') as in other formulations of Q-learning.

Discuss this question on aiqus. When posting use the tag 'hw5-1'

Homework 5 2 Function Generalization ANSWER.mp4

In this question, when = < is used, the < is used like the opening of a set or list. You can think of it like = { .

Discuss this question on aiqus. When posting use the tag 'hw5-2'

Homework 5 3 Passive RL Agent ANSWER.mp4

Discuss this question on aiqus. When posting use the tag 'hw5-3'

Page updated

Google Sites

Report abuse