Policy Iteration Planning
Here, I find the optimal trajectory for a robot in a grid world with obstacles and rewards. The robot can move in four directions (north, east, west, and south) with some stochasticity. The goal is to maximize the expected discounted reward over an infinite time horizon.
You can find the code here.