This post assumes working familiarity with DP problems. It documents a way to look at DP from a slightly different perspective, focussing on DAGs and compiler techniques like common subexpression elimination and loop skewing
Consider a naive recursive solution to a problem. The recursive function calls form a tree. Running Common Subexpression Elimination of the recursive tree call generates a Directed Acyclic Graph (DAG) that corresponds to an optimal solution. Solving the DAG of subcomputations by going from sink to source (reverse topological order) is the "memoization" or top down approach, and solving from source to sink (topological order) is the "tabulation" or bottom up approach. Of course, memoization/top-down method has the advantage that you dont need to compute the topo sort and precompute the order, but the disadvantage is you tend to proceed with a recursive function (which might be less efficient in some languages). The tabulation/bottom-up method needs you to precompute a good computation order (the topological sort), but then you can store results in a vector like structure and have potentially do the computation without recursion. Of course one could bring more graph algorithms in and discard results in the bottom up approach that will not be needed in the future node computations if space is an issue.
Consider the naive fibonacci recursion call tree. And then imagine CSE is applied to its nodes.
For memoization/top-down, we call start computing from fib(5) and on the way the fib(4) computes fib(3), which the fib(5) call will reuse
For tabulation/bottom-up, we start from fib(1) and fib(2) and proceed evaluating in topological order
Top: Naive fibonacci call tree. Right: CSE applied on call tree
Consider the rod cutting problem (Refer Cormen). The pseudo code for the code and the recursive call tree is shown here.
Now consider the bottom up DP optimized solution. "j" varies from 1 to n, and i varies from i to "j". The solution exactly corresponds to CSE on the tree shown.
With increasing position in the toposort of the DAG the nodes get increasing number of input edges. This reflects the fact that the inner loop on i goes fro mi to j, and as j increases the number of input edges increases
Notice that from the graph structure, it is obvious that no results can be discarded (result of node 0 is used even in node 4). Contrast this with the fibonacci graph where fib(2) can be thrown away after fib(4) has been computed.
Just converting a naive recursive function call tree to a DAG is just the starting of optimization. Now that we have a computation DAG, we can throw all kinds of compiler techniques at it.
A lot of problems have a strict 1D or 2D tabular structure. For example consider the "Matrix chain order" problem in Cormen. The DAG in that case looks like the figure shown on the left below. Following simple toposort, we can construct the solution in a bottom up fashion easily. But we can do better.
Note that we can apply loop skewing to allow some parallel processing. If we process in the order where we do each diagonal at a time, then all elements of each diagonal are independent and can be computed parallelly. From the graph the idea is obvious: Process all nodes at the same "depth" parallelly.
Left: original DAG of computation order
Right: 2 "wavefronts" of evaluation, the left one is serial, the rightmost one is parallelizable