Assignment 2 Reinforcement Learning

Reinforcement learning is "how an agent ought to take actions in an environment so as to maximize some notion of cumulative reward." wikipedia
Here is a very cool example from GRITSLAB, the robot is trying to learn how to move without any prior knowledge:

YouTube Video

1. Value Iteration
Update these two arrays at every step
Q array:
Q(s, a) = R(s, a) + γ V (δ (s, a))

Value array:
V (s) = max Q(s, a)

γ is the discounted factor.

2. Q-learning
Qn (s, a) = (1 − αn )Qn−1 (s, a) + αn [r + γ max Qn−1 (s′ , a′ )]

α is the learning rate.

Here are some sample Value Iteration results and Q-learning results:
Value Iteration:


1. Though value iteration is very powerful in the deterministic and fully observable world, yet Q-learning is better in a non-deterministic and unknown world.
2. Inverse reinforcement learning is also cool!