Anirudh Kailaje

Policy Iteration Planning

Here, I find the optimal trajectory for a robot in a grid world with obstacles and rewards. The robot can move in four directions (north, east, west, and south) with some stochasticity. The goal is to maximize the expected discounted reward over an infinite time horizon.

You can find the code here.

Next Project

Page updated

Google Sites

Report abuse