Dynamic Programming


Dynamic Programming
a method for solving complex problems by breaking them down into simpler subproblems. It is applicable to problems exhibiting the properties of overlapping subproblems which are only slightly smaller. such as
Dijkstra's algorithm for the shortest path problem
Fibonacci sequence
Tower of Hanoi puzzle
simplifying a decision by breaking it down into a sequence of decision steps over time. Vn(y) is the value obtained in state y at the last time n. The values Vi at earlier times i = n −1, n − 2, ..., 2, 1 can be found by working backwards



Example

Assume a consumer who lives over the periods t = 0, 1, 2, ..., T and must decide how much to consume and how much to save in each period.
  • Let ct be consumption in period t, and assume consumption yields utility u(ct)= ln(ct as long as the consumer lives.
  • Assume the consumer is impatient, so that he discounts (http://en.wikipedia.org/wiki/Discounting) future utility by a factor b each period, where 0 < b < 1.
  • Let kt be capital in period t (wealth, cash), Assume initial capital is a given amount k0 > 0
  • This period's capital and consumption determine next period's capital: kt+1 = A kat - ct , where A is a positive constant and 0 < a < 1 [which leads to the root of k (e.g. if a = 0.2 means the 5th root of k, √ )]. Assume capital cannot be negative.
The consumer's decision problem can be written as:

       (which is the discounted utility value as described above)


subject to
kt+1 = A kat - ct ≥ 0


Written this way, the problem looks complicated, because it involves solving for all the choice variables c0, c1, c2, ..., cT and k1, k2, k3, ..., kT+1 
simultaneously. (Note that k0 is not a choice variable—the consumer's initial capital is taken as given.)

Dynamic Programming

The dynamic programming approach to solving this problem involves breaking it apart into a sequence of smaller decisions.
To do so, we define a sequence of value functions Vt(k), for t = 0, 1, 2, ..., T, T+1 which represent the value of having any amount of capital k at each time t.
 Note that VT+1(k) = 0 , that is, there is (by assumption) no utility from having capital after death.
The value of any quantity of capital at any previous time can be calculated by backward induction using the Bellman equation (a necessary condition for optimality associated with the mathematical optimization method known as dynamic programming).
In this case the Bellman equation is:
This problem is much simpler than the one we wrote down before, because it involves only two decision variables, ct, kt+1.
Find the optimal amount to consume at time can be dealt with as a back tracking problem, starting from back and knowing the relationships between current consumptions and the utilities gained and time.












Comments