Reinforcement learning is "how an agent ought to take actions in an environment so as to maximize some notion of cumulative reward." wikipedia Here is a very cool example from GRITSLAB, the robot is trying to learn how to move without any prior knowledge: 1. Value Iteration Update these two arrays at every step Q array: Q(s, a) = R(s, a) + γ V (δ (s, a)) Value array: V (s) = max Q(s, a) * γ is the discounted factor. 2. Q-learning Qn (s, a) = (1 − αn )Qn−1 (s, a) + αn [r + γ max Qn−1 (s′ , a′ )] * α is the learning rate. Value Iteration:
Q-learning:
1. Though value iteration is very powerful in the deterministic and fully observable world, yet Q-learning is better in a non-deterministic and unknown world.
2. Inverse reinforcement learning is also cool! |
Course Portfolio > CS3630 Intro to Robotics >