## Algorithms:  Dynamic Programming (Text Ch 8)

Back to Greedy algorithms

### 8.2 All Pairs Shortest Path and Transitive Closure

This section can technically be thought of a dynamic programing, though it is not usually discussed that way.  I am doing it first to round out our discussion of graphs with these important algorithms.  Certainly be very capable with the short algorithms and the ideas of shortest paths and transitive closure.  You should be able to illustrate your understanding of the basic ideas by calculating shortest paths and transitive closures for small graphs, by whatever means is easiest for you.

#### All Pairs Shortest Path

We have found all the shortest paths from a single specified starting point (Dijkstra, Bellman-Ford).  If we want to find shortest paths for all pairs of vertices, we could use this algorithm over and over, but that is not the most efficient approach.  Look for others:
Might think of growing a graph one vertex at a time.  Add new point k, and shortest path from i to j may go through k, with p, r before and after k, so you can look for paths i to p to k to r to j.  Unfortunately looking at all possibilities is O(n5).   A better approach is Floyd-Warshall.  The trick is not to gradually increase the number of endpoints - allow all of them right from the start, but gradually increase the set of points used as intermediate vertices.
http://en.wikipedia.org/wiki/Floyd%E2%80%93Warshall_algorithm

Transitive closure (Warshall's algorithm)
Suppose we do not care about distance, but only whether you can get there.  The transitive closure of a graph G is a graph with the same vertices as G and an edge vw if there is a path from v to w in G.  For a small graph you can work this out by eye easily (with or without a special algorithm).  The book's initial example is below.  The thinner edges are the ones added to make the transitive closure.  The totally connected component ABD ends up with edges between all pairs in both directions.  Since D connected to C, all other vertices in ABD also connect to C.  In general, if any edge goes from one totally connected component to another, then in the transitive closure, all edges in the first totally connected component are connected to every edge in the other totally connected component. When the graphs get big, you cannot just do them by eye.  An obvious approach for transitive closure is just to do a DFS from each vertex.  This is useful in some cases (see homework).

There is an easy adaptation of Floyd-Warshall works, using boolean values aij in place of distances, and boolean operations in place of + and min.  The initial values are just those of the adjacency matrix.  The innermost line in the algorithm becomes
aij = aij or (aik and akj )

Warshall's algorithm for transitive closure is short to write, but not the lowest order.  Interpret matrix multiplication of boolean matrices to substitute AND for multiplication and OR for addition with an adjacency matrix A.   The identity matrix I, gives all the vertices reachable in 0 steps (just the vertices themselves).  Let M = I + A.  Then M gives all vertices reachable in 0 or 1 steps.

Consider M2.  It shows all vertex pairs and if there is a connecting path between them of length at most 2.  Just consider the meaning of the sum for the ij element of M2 =  mi1*m1j + mi2*m2j +...+ min*mnj + mi1*m1j which says there is path from i to j through vertex 1, or through 2, or ... through vertex n.  All the possible paths of length at most 2 are considered.  Similarly, M3 = M2 *  M shows all vertex pairs with connecting path between them of length at most 3, putting all paths of length at most two with up to one more edge.  Similarly up through Mn-1, which shows all vertex pairs with and connecting path between them of length at most n-1.  The longest possible path between the n vertices is n-1, so Mn-1 gives all vertices reachable from another.  Put together two algorithms:  the logarithmic order algorithm for powers, calculating Mk in terms of Mk/2, and Strassen's algorithm for matrix multiplication.  Using the combination, Mn-1 can be calculated in O(nlg 7 log n) using O(log n) matrix multiplications with each matrix multiplication being O(nlg 7).  The order  of nlg 7 log n is less than the n3 of Warshall.  Given the overhead for Strassen's algorithm, this is only better than very simple Warshall when n is very large.  Without Strassen there is still an advantage of using matrix methods:  hardware can do 32 or more one bit boolean operations in parallel.  To be effective both the matrix and its transpose need to be updated, reducing the gain by half.

Floyd's algorithm for shortest distance is very similar to Warshall's,  but the trick with matrix multiplication works only for transitive closure, however, since multiplying distances does not help at all.

Dynamic programming, general introduction

Dynamic programming is a whole approach to problem solving.  Getting an intuitive and practical feel for the idea of dynamic programming is more important than the individual examples.  You can read 8.1.  For an introduction to the approaches, see fib.py.  This ridiculously simple example allows the techniques to be displayed with no complications obscuring them.

All these ideas apply in more complicated cases.  The central idea is that results in the recursion "tree" are reused a lot.  Essentially that means imagining a tree of cases is the wrong thing to do, since we are essentially saying the branches rejoin in a common subproblem.  More generally the dependent problems form a DAG.  To take advantage of the DAG structure there is a trade off with time and space usually.  Keeping extra space requirements to a minimum is an issue.

In most problems,the DAG is implicit, and connections are found by digging into the code.  To emphasize the centrality of DAG's, we can state the problem for an explicitly given DAG, where we explore the sequence of critical path solutions:  critPathRec.py, critPathDFS.py, critPathTopo.py.   Though we did a version of the last one first,  it  is the most sophisticated one, improving on the others.  The recursive version recalculates for each vertex every time it is reached, so it is inefficient if nodes are used multiple times.  The DFS version memoizes, recording data for each node and the idea if checking if a node is visited is checking if data still needs to be calculated, or has already been done.  The reverse topological sort avoids the memoizing.

### Dynamic Programming as a Strategy

• Tends to be used for recursive optimizations where subproblems are reused, and there are not too many subproblems.
• Recursion makes logical sense if the optimum/answer for subproblems carries over as a part of the optimum/answer for a larger problem containing the subproblem.
• Dynamic programming is distinguished from regular recursion by the reuse of subproblems.  It depends on the setting up the subproblems in a way that the space for the results of all subproblems needed at one time is small enough to fit in the memory allocation for the problem.
• In order to store problems there must be an easy way to specify the parameters that distinguish one subproblem from another.  These parameters are usually just the parameters that change in recursive calls in the recursive version of the solution.
In the past classes have had a lot of trouble with this approach.  There are two quite distinct parts:
1. Getting started by stating a recursive algorithm that gives the problem-specific relation of larger problems to smaller problems.  For this part you do not need to worry about the basic dynamic programming idea of conserving the time spent on reused problems (except that you do want a recursive structure that does not have too many subcases).  People seem to have trouble with recursion in general AND this is the creative part of the solution (always the main trouble).  We will spend a lot of time on this and you will need to get your recursion abilities honed!
2. Given a clearly stated relationship of larger to smaller problems, and not having too many smaller problems total, it is a pretty mechanical process to convert a recursive algorithm to dynamic programming, but students still have a problem with it, I'm not sure why.  Be sure to ask questions.  This is a basic ability that I will certainly expect of you!  With little thought you can memoize.  We will see pretty standard patterns for topological orders.  The most time consuming part is keeping a secondary table to remember the optimal choice of subproblems if you want to reproduce an optimal sequence.  This is still pretty mechanical but admittedly messy, initially setting up pointers to the previous step, and  then traversing the optimal linked list at the end.
Work in class.....  Do not look at the solutions to the next problems until we have worked out the basic ideas in class!

My code generally includes a pure recursive solution, and a dynamic programming one with a topological order, and sometimes the longer version allowing you to reproduce the optimal path.  Before looking at my dynamic programming solution, all the examples are good ones for you to practice starting with the recursive version and doing a conversion to dynamic programming, until you are sure you clearly have the mechanics, and you can also make sure you have the mechanics of keeping the secondary table to reproduce the optimal path.

Consider in class the contiguous subsequence problem, finding a maximum sum:  (DasGupta 6.1):
Given real numbers a1, a2, ... , an,
for each i, j, 1 <= i <= j <= n, the sequence for numbers ak, for i <= k <= j, is a contiguous subsequence.  Find the maximum sum obtained by adding a contiguous subsequence.   What is the time for brute force?   We will need to do better....

...
Hint:  Max subsequence must end somewhere.  Use D(i), as the maximal sum ending at index i, which may include no elements, and be 0, so its minimum value is 0.
...
D(i) is either 0 or based on a sequence ending with ai.  If ai is included, it never hurts to add the best sequence ending at i-1, D(i-1), since D(i-1) is at least 0.  Hence
D(i) = max(0, D(i-1) + ai).  See the largest contiguous sequence sum in Python.  I added a second version to keep track if the index range of the best sequence.

Since subproblems do not get used multiple times, this is not really a dynamic programming problem, but the recursion idea is still important,and illustrates a generally useful point:  The general problem is just find the largest sum, somewhere in the sequence, but we defined D(i) with an extra condition, that the sum sequence ends at i.

This is a common distinction to consider:  Possibilities:
• The subproblems can be expressed exactly like the overall problem ( just with a smaller data set)
• The subproblem is more specific.  This is often the case when then general problem allows some data elements to be included or not, or be the end of the part in the optimal choice or not....  The subproblem may specify that the last element in the partial-problem dataset IS used or IS the ending parameter in the optimal solution.  In these cases extra information needs to be maintained (commonly a running maximum or minimum) to produce the final answer to the more general question.
Datasets for common subproblems.  The following ways of looking at problems with one or two datasets can help you get started:
1. With a single sequence of input data like a1, a2, ... , an, there are two common situations for sub problems and a less common one.
• a1, a2, ... , ai, for some i = 1, 2, ...n
• ai, ai+1, ... , aj, for some i <= j
• all subsets of the data.   This is automatically exponential in the size of the data, so it is not really good, but sometimes the alternative is O(n!), which is much larger.
2. Another possible input situation is to have two data sequences, a1, a2, ... , an, and b1, b2, ... , bm.  Here the data for a subproblem is often  a1, a2, ... , ai, and b1, b2, ... , bj.
3. If the problem can be stated in terms of a rooted tree, subproblems often correspond to subtrees.
4. Sometimes there are multiple data elements, but they make sense to be a part of all problems, and there is a much simpler dependency:  If the problem depends on  one or more positive integer numbers parameters specific meanings (not part of a set), maybe the dependency is just on the same problem with smaller values for these parameters.
Of course the choice of how to set up the smaller problems depends on how you can relate the smaller problems to the larger problem!  That will always be problem specific.  You have to look your specific situation, and understand its ramifications.  Look at concrete examples and see what a solution to a smaller problem had to do with a larger one. You need to gain experience working on this!  We will see there may be multiple ways to set up relationships to smaller problems, some impractical, others practical.  For dynamic programming to be effective, there must be a limited number of these subproblems for two reasons:
1. The overall time is related to the total number of subproblems needing to be considered.
2. If you need a table of the result of all these problems, there is an issue of space.
Class problem solving: given a sequence of n letters or numbers, find the longest palindrome subsequence.  Unlike the last problem or a regular palindrome, there can be gaps between elements of a subsequence.

A palindrome subsequence of abracadabra is a b a c a b a. Another is a r a d a r a.

Develop table: sizing, initializing, filling, step by step in class!
Stub to build on basic recursive case, palindromeStub.py, using printtables.py for display:

Check later:  complete Palindrome subsequence in Python, using printtables.py for display, with extra versions reconstructing the palindrome, and showing off the data used for reconstruction.
There are several ways to look at this problem.  If you are looking to relate it to smaller problems, what might you consider?  What smaller data set and what problem with that dataset?  Hint:  Remember it may be useful to recursively solve a problem with extra conditions, to make the recursive reasoning easier.

Consider how to do a topological ordering in the table.  Assume i is the row and j is the column.  The base cases give the main diagonal as 1's.  The 0-length intervals correspond to i=j+1 (just below the main diagonal).  The entry for (i, j) depends on table locations (i+1, j-1), (i, j-1), (i+1, j).   For instance the position with the '?' depends on the 3 locations with D already being filled in.

j
-----------+
1          |

01         |
01        |
i   01   D?  |
01  DD  |
01     |
01    |
01   |
01  |
01 |
01|

The dependency on positions down and to the left, means you want to start as far as possible down to the left, and and work from there.  The rows are shown as an alphabetic sequence with the direction indicated:

j
-----------+
1
>>>>>>>>>z|
01         |
01        |
i   01  ...  |
01      |
01>>>>e|
01>>>d|
01>>c|
01>b|
01a|
01|

My old alternate notes on two of the book's examples follow, knapsack and  sequence of matrix multiplications:

### Knapsack problem

You have a knapsack which holds weight W, and you want to carry away as much of the treasure hoard that you found as possible.  There are n items with integer weights wi and values vi, i = 1, 2, ... n.   What is the maximum treasure value you can take?

Simple example:  Knapsack holds 5kg (W = 5) and there are 4 items to choose from:

 Index Weight Value 1 2 12 2 1 10 3 3 20 4 2 15

What is the brute force approach?  What is the order of the time?  What is the final solution for the data above?
What is a greedy algorithm?  What answer does it give for the data above?

It turns out that a very similar problem is much easier:  Suppose all the valuables are precious metal powders, so any fraction of an item can be considered.  Then the obvious greedy algorithm works to give the optimal solution.   Proving this is a reasonable exercise.

Back to the discrete all-or-nothing case.  Suppose W is way less than 2n.  Find the outline of a dynamic programming solution.

. . .

Clearly it is a simpler problem with fewer items.  We do not want to consider all 2n subsets.  What else might we do?  Add one item at a time,
so there are subproblems for the first k items, k <= n.  How can we relate the problem for n items and max weight W to smaller problems that do not involve the last item?  How does W come in?  How does subproblem optimality come in?

If adding wn to the knapsack provides the best answer for weight W, then removing the nth item must provide the best solution for all items 1 ... n-1, for weight W-wn, so of course the max weight is also a parameter for the subproblems.

Let V(i, j) be the maximum value selecting among item 1, 2, ... i, if the weight limit is j.

Then in the situation described above,
V(n,W) = vn + V(n-1, W-wn).  Complete this, and worry about special cases.

The above equation only makes sense if  (1) the nth item is helpful (2) W-wn is nonnegative.
If we were not to include wn,  V(n, W)  would  be the same as  V(n-1, W).  This discussion also applies for indices i in place of n, i = 1, 2, ...n:
. . .
Assume V(i, j) = 0 if i = 0 ( no items) or j = 0 (0 weight limit).  Then for i = 1, 2, 3, .... n, j = 1,2, ... W:
if wi > j
V(i, j)  = V(i-1, j)
else
V(i, j)  = max(V(i-1, j), vi + V(i-1, j - wi))

Clearly extra space is order of nW, since time for each entry (assuming dependencies calculated) is O(1), total time in the order of nW also.  Note this dynamic programming algorithm is only helpful for sure if the order of nW is smaller than 2n.  This is not true in general.  In general this problem is hard.

With the simple sample data given above, we start with the top row and left column of 0's, and work out one row at a time:

 Best Value Index Weight Value Maximum Weight 0 1 2 3 4 5 0 - - 0 0 0 0 0 0 1 2 12 0 2 1 10 0 3 3 20 0 4 2 15 0

generating:

 Best Value Index Weight Value Maximum Weight 0 1 2 3 4 5 0 - - 0 0 0 0 0 0 1 2 12 0 0 12 12 12 12 2 1 10 0 10 12 22 22 22 3 3 20 0 10 12 22 30 32 4 2 15 0 10 15 25 30 37

so the final  maximum choosing between all 4 items with a weight limit of 5 is 37.  (This was a reasonable scale to demonstrate the dynamic technique, though in this small case, there are more entries in the table above than were considered by brute force!).  See knapsack.py, using already referenced printtables.py for display.

Homework problem  R:   Add Python code or pseudocode to print pairs (value, weight) for each item in the optimal set, in increasing order of index.  This can be recovered from the basic table, and the lists of values and weights, without generating a separate table, though you can generate an extra table if you like.  Hint at end.

### Multiplying a sequence of varying sized matrices

Minimizing floating point multiplication of elements when multiplying n matrices of varying dimensions
A1
A2 A3 ... An
A1 A2 A3 A4  could be
((A1 A2)A3)A4  or
((A1 A2)(A3 A4))  or
A1((A2 A3)A4) or ...

To multiple k by m and m by n matrices need k*m*n multiplications by the definition (ignoring Strassen and other such fancy multiplication techniques).

Suppose the dimensions of Ak are dk-1 by dk.  The total can be dramatically different with different orders.
If the sequence of dimensions are 30 1 40 10  25

((A1A2)A3)A4 -> 20700
(A1A2)(A3A4)  -> 41200
A1((A2A3)A4)  -> 1400

One approach is to just follow the order of computation:  pick an adjacent pair to multiple, removing one dimension.  do this recursively, testing each position.  The work to multiply p by r and r by s matrices (by definition, not Strassen) is O(prs).  See multMat1.py

Time?
What if reuse results?

Another approach:  How are numerical calculations generally viewed in a compiler or expression evaluator:  As a tree:
*
/   \
1      *
/   \
*     4

/  \
2    3

The recursive approach to trees suggests looking at the root and subtrees, and all subtrees that could replace a given subtree.   How can we parametrize that different problems?  Each subtree will handle a contiguous range of the indices.  Let the subproblems be f(i, j), which gives the minimum number of multiplications for the matrices involving dimension indices i .. j.  The principal of optimality applies:  The optimal solution for a problem involves the optimal solution for subproblems.  Given the matrices for a range of dimension indices from i through j, the question remains, where will the root of the tree be?  In other words, what will be the dimension index k, so the left subtree comes from dimensions indices i..k, and the right subtree comes from k..j.  You need to test for each k, and take the optimal one.   Each involves a root multiplication with dk-1 * dk * dk+1 individual steps, and the ones for the smaller problems.  Find the max over k of

dk-1 * dk * dk+1  + f(i, k) + f(k, j)

See the beginning of multMat2.py  (the recursive version only).
How to make a topological order?

0 entries on diagonal are immediately known
To calculate entry at ?  you depend only on the entries labeled D
---------+
0        |

0       |
0DDDD? |
0   D |
0  D |
0 D |
0D |
0 |
0
think before looking

...

Order of calculation showing order of row order and direction in rows, similar to the palindrome example
07----->
06---->
05--->
04-->
03->
02>
01
0

MultMat2.py also uses this method, and ends with a version keeping track for the multiplication order.

Homework Problem S:  Write an algorithm to take the last array and produce the correct fully parenthesized multiplication expression for the matrix multiplication, in a form like
((1 2) (3 (4 5))) or a postfix (Reverse Polish) notation like
1 2 * 3 4 5 * * *

For brevity and printing ease, instead of using the full subscripted matrix names as in (A1 A2), you can just use the subscripts alone as in (1 2).  Hint at end.

The text covers the optimal binary search tree problem.  This is explicitly a tree problem, and immediately suggests the form of subproblem.  The algebra is more involved, until you see how it simplifies.  This section may be considered below as a problem.  It is not a really realistic solution.  With the static distribution, a hash table could be used easily.

Problems to possibly consider in class:
In all problems, besides pseudocode, analyze the order of the space and time requirements (here and in homework).

1.  Making change:  Given coin values 1 = c0 < c1 < c2 < ... < cn-1, find the minimum number of coins needed to give amount A in change, assuming you have a large supply of each value of coin.  Note you cannot assume US coin values, where the greedy method works, but it does not work in general.  Compare your answer to the solution dyn_coins.py.  An extra elaboration would be to list the coins.  (This is basically Levitin 3rd edition, section 9.1 Example 2.)

2.  Determining line splits in a paragraph with fixed width font, and penalty function for the amount the width of all the words differs from the column width W.  For instance words of length 3, 2, 4 fit on a line of width 13.  They take up space with the intervening blanks of 3+1 + 2+1 + 4 = 11, with 2 spaces left over to give a penalty.  The simple transformation is to include the space after each work in the length of the word, and assume the maximum width W is also one larger, so the corresponding situation would have words of length of  4, 3, and 5 on  a line of length 14, and the extra space is 14 - (4+3+5) = 2 again.  We assume this transformation.

So gives lengths w1, w2, ... , wn, and a maximum line length W, and a penalty function f(s) for the number of extra spaces on a line, organize a paragraph with minimum total penalty.  The assumption is that there is no penalty for a short final line of the paragraph.  An example penalty function would be f(s) = s3.   Think...

...

Brute force is ridiculous.  We consider an iterative approach building on shorter lists.  We could grow the list from either end, back or front, but it turns out that the special end penalty makes this not be symmetric, and it is slightly easier to imagine starting from the end and working backwards.   As long as all the words fit on one line, there is no penalty, but then....  Solution lineBreak.py

I repeat again, the idea of transforming a recursive solution to dynamic programming is very basic.  Be sure you can do that.  Besides the homework, you can start with the recursive example solutions I give and get the dynamic solution yourself.

- - - - -

Other possible examples:

edit distance (done in DasGupta)

fastest binary search (8.3)  Python recursive only: optSearch1.py, dynamic: optSearch2.py.  The latter needs printtables.py.

Longest increasing subsequence: (done in DasGupta)  Given numbers a1, a2, ... , an, find the longest strictly increasing subsequence.  For example, given 5, 2, 8, 6, 3, 6, 9, 7,  some increasing subsequences are 5, 6 ,7  and 2, 3, 6, 9, which is one of the the longest ones.

DasGupta 6.3, 6,6, 6.5, 6.12, 6.21

Hints:
R:  Since there are only a couple of possibilities for the reason for each new entry, you can just look at the table to see which place is feeding the current entry, and work backwards as usual, though the questions asks for the final answer to be stated forward through the items.
S: Given the last matrix, this is concisely stated recursively.  Another chance to work on your recursion!

Next  P and NP