Algorithms, Web page 8
Greedy Algorithms, Graph and Others (Text Ch 9)

Back to Graph Traversals

Overview

A greedy algorithm is one that makes a local decision of what is cheapest or best, without looking at whole solutions.  Generally this does not provide an optimal solution.  Sometimes it does.  Guessing a greedy algorithm is often fairly easy, while showing it solves the overall problem is likely to be harder. 

We first look at several graph problems where a greedy approach can be proved to give a globally best solution.  It turns out the several basic data structures support these graph algorithms, heaps, minheaps with reduce key, and union-find dynamics sets.  They are discussed in more detail, too.

Spanning Tree (Applies only to undirected graphs)

In a connected undirected graph G, a tree subgraph T that connects all vertices is a spanning tree.  In a disconnected graph, a spanning tree is a subgraph composed of a spanning tree in each component of G.

Minimum Spanning Tree (MST)

Given an undirected graph G(V, E) with weighted edges, a spanning tree of minimum total weight is a minimum spanning tree, MST.

We will look at two quite different algorithms with different advantages and drawbacks.  First, Prim's algorithm which builds up a larger and larger connected graph:

Prim's MST algorithm for graph G

Let V be the vertices of G.
Start with an arbitrary vertex s in V.
Maintain three disjoint sets (or three classifications for vertices):
Also maintain a mapping, p, from each vertex to its parent in the tree, initially all null.
while F is not empty:
    select and remove a vertex v from F with minimum weight
    add v to T
    for each neighbor w of v:
        if w is in U:
             take w from U and add it to F with the weight of the edge vw
             p(w) = v
        else if w is in F:
             if the weight of the current edge from T to w is higher than the weight of vw:
                   identify fringe vertex w with the weight of edge vw
                   p(w) = v

When the algorithm is over, all vertices with paths from s are in T, and p maps all these vertices other than s to their parent in the tree T.   Do an example.

See prim_mst.py where you can execute the algorithm.  Before considering the implementation in detail, note that selecting the fringe element with minimal weight suggests storage in a priority queue.  One implementation of a priority queue is a heap.  A basic heap implementation is in heap.py.  That, however is not efficient in this situation. Not only are elements added and removed from the priority queue(basic operations), there is also the need to decrease the weight of an element already in the priority queue when a better edge from the tree to a fringe element is found.  A shortening of a "priority queue where you can decrease the value associated with a key" is a "priority queue with a decrease key operation".

Without specializing to a particular priority queue implementation yet, the time for a graph with n vertices and m edges,  assuming m > n, is
O(n*insertItem + n*(popitem) + m(decreaseKey + getWeight))
In a basic heap there is not a way to find a specific current element, other than a linear time search. In some cases the number of times that a decrease key operation must be performed is Θ(|E|2), so using a linear search could lead to a Θ(|E|3) algorithm overall.  An improvement over a basic heap in this situation, is a heap that can do a decrease key operation in O(log |E|) time.  See  heapdecreasekey.py.  The heap must keep track of both each vertex and the weight of the edge to it, while the normal heap would just allow the smallest weight element to be accessed.  The extra structure comes with a mapping maintained between a vertex (the key) and its position in the heap.   So you must be able to find the entry for fringe vertex v, decrease its weight, and modify the heap with the decreased value (bubble up the element).  In   heapdecreasekey.py my notation for the lookup for weight of vertex v in priority queue pq is pq[v].  The same notation is used to insert a new element: pq[v] = wt, and for decrease key :  pq[v] = lowerWeight.  When popping the minimum from the heap, you recover both the vertex and the weight:  (v, wt) = pq.popitem().  Now weight lookup is O(1), thanks to the explicit mapping, and decreaseKey causes an O(log n) bubble-up to the proper place in the heap.

Time analysis:  In this case the general expression  from above
O(n*insertItem + n*(popitem) + m(decreaseKey + getWeight))
becomes, with the elaborated heap,
O(n*lg n + n*(log n) + m(log n + 1)) =  O((n + m) log n)

There are other more complicated priority queue approaches where decrease key is O(1).  The text has references.  We will not pursue them in this course.

To prove the correctness of Prim's algorithm we need a theorem, that we will also use again in the future (not in book, but is in DasGupta as the Cut Property!):

MST Cut Theorem

Suppose edges X are part of a minimum spanning tree of G = (V, E ).  Pick any subset of vertices S such that X does not cross between S and V − S, and let e be the lightest weight edge across this partition. Then X ∪ {e} is part of some MST.

Proof:  Suppose G, X, S are as above, and T is an MST of G that contains X, and vw is a minimum weight edge between V-S and S.  If T also contains vw, we are done, so suppose vw is not in T.  If we add vw to T, a cycle is created with the path P in the tree from v to w, plus the edge vw.  Removing an edge from this cycle recreates a spanning tree.  Choose the edge to remove as follows:  Since one edge of the cycle crosses between S and V-S, another edge, v'w' must cross back  Let T' =  T + vw - v'w', another spanning tree.  Since vw is a minimum weight edge from V-S to S, and v'w' is another W(vw) ≤ W(v'w') and hence W(T') ≤ W(T), but T is an MST so W(T') = W(T), and T' is also an MST.

Theorem:  Prim's algorithm produces an MST.

Proof
by induction on the number of steps so far, that the tree so far is a part of an MST, and hence the final step is an MST for G.
Basis:  Initially no edges, certainly a subset of any MST.
Induction step:  Let T be the tree constructed by Prim's algorithm at some step before the final one, and let vw be the next edge added, with v in T and w not in T.  Use the Cut Theorem with T = X, and S the vertices in T, so v is in s and w is in V-S.  Then since vw has minimal weight, T+vw must be part of an MST.

In proving things about MST's the construction in the MST Cut Theorem is standard:  adding and edge, causing a cycle, and removing an edge somewhere else in the cycle.  Be sure you follow the argument.

We will return to MST's and Kruskal's algorithm, but first an algoithm for a different problem, but with a solution almost exactly like for Prim:

Dijkstra's shortest path algorithm

We are looking for the shortest path (thinking of weights as distances ) from a specified vertex s to other vertices v, dist(v).  Although the problem is quite different than the minimum spanning tree problem, the solution algorithm is almost the same as Prim's MST algorithm:  The only change is that instead of priority of a candidate being given just by edge weight vw for Prim, it is dist(v) + the edge weight for vw for Dijkstra. In dijkstra.py the few lines with changes are marked.  

Beside examples in the text, there is an example in picture from DasGupta, Section 4.4 of DasGupta, page 110, or about the 5th page of the chapter.  See either of these for a proof that Dijkstra's algorithm works.  Be sure to note where the nonnegative edge weight comes in.

Essentially the same analysis as Prim for time, O((n+m)log n).

Kruskal's algorithm:

Maintain a set of edges F (for forest) so far, initially empty
and the set of edges R remaining, initially all of E
while R is not empty:
    Remove lightest edge vw from R
    if vw does not make a cycle in F
         add vw to F
 return F

Proof it works - easy from MST Cut Theorem, pictured well in DasGupta

Time:  getting lightest edges takes time O(m lg m) in worst case.  The checking for components is trickier.  We have already done this efficiently in the static case, but repeating that is not efficient in the dynamic case.  There is a data structure and union-find algorithm that is better.   See pictures in our text or Ch 5 of Das Gupta.  See kruskal.py, which uses unionfind.py.

So back to the time:  with the efficient version of union-find and a maximum of 2m finds, n-1 unions, making a union-find program of length 2m+n-1.  If we assume the usual case that m >= n (or we include some isolated vertices with no edges), this contributes O(m lg* m), so sorting edges is the main contribution O(m lg m).  

Compare Prim:  With the algorithms indicated, both Prim and Kruskal have the same order.  Depending on the specific situation, either could be faster in practice.  With fancier algorithms Prim could be faster.  If for some reason you have the edge weights already sorted, Kruskal ends up faster.

It is useful to understand the situation when there are multiple MST's for one graph:  

MST Edge Swapping Theorem (not in book!)
Let G be a weighted graph.  If  S and T are MST's for G, with k elements of S not in T, then there is a sequence of MST's  T = T0, T1, T2, ... Tk = S, where the transition from one MST to the next involves only the removal of one edge of T and adding of another from S of the same weight.

Example progression:
MST sequence

Proof:  This follows immediately by induction from the following.  Basically if you have two MST's that are k edges apart, then you can find an MST that is one edge from one of the MST's and k-1 edges from the other:

Lemma
for the MST Swapping Theorem:
If S and T and G are as in the MST Edge Swapping Theorem, and there are k edges in S that are not in T, with k > 0, then there is an MST  T' for G that is either formed from S by removing an edge of S that is not in T and adding an edge from T that is not in S, or is formed from T by removing an edge of T that is not in S and adding an edge from S that is not in T.

Proof of Lemma:  Homework exercise
  
If you want to go up one level of abstraction, you could restate the theorem in terms of a graph SG, whose vertices are MST's of G and whose edges connect MST's differing by only the addition and deletion of one edge in G.  The theorem then says that SG is connected, and the distance from any one to another is the number of edges in one that are not in the other.

Union Find for Kruskal, Step by Step

The word rank or weight rather than depth in used purposefully in the discussion of in-trees:  Remember that the efficient union-find gets part of its efficiency from compression-find only traversing up from the the specified starting node up to the root, not looking through the whole in-tree somehow.  Though one path upward may compress, you cannot assume that the compression of one path reduces the depth, without having a full check of the in-tree.  The maximum depth (looking at the whole in-tree) may decrease, but there is no computational time given to determining that.   The rank is a simple-to-keep upper bound on the actual depth.  The only way rank changes is through the simple calculation when a union of two trees of the same rank is done, increasing the rank by 1.  Rank never decreases, so after some compression the rank may be larger than the depth. 

Assume that when there is a union of two trees with the same rank, that the first root alphabetically stays as root.  The Kruskal sequence with the graph is at the left.  The union-find data structure with weighted union and compression find is on the right.  Read down the column first. There is one error:  The bottom of the left column's graph should not be the same as the top of the right column.  Edge CD should not yet be chosen (not thick) at the bottom of the left column.  Red is used to show a graph edge considered, but not used (makes a cycle).  Red in the in-trees is used to indicate a removed pointer during compression. You should see a new link added to replace it.  See where compression is done and a new edge is accepted and where compression is done even when the proposed edge is not used, since compression find is done before the final check for a cycle.

union find with Kruskal


Huffman - follow our text or Ch 5 of Das Gupta pages 12..16 of the chapter.


An important characteristic of any Huffman encoding is that no bit string used to represent a character is the prefix of any other bit string used to represent a character.
                                               
Hypothetical Code:
     A:  01
     B:  011
     C:  110
     D:  10

How do you decode
  0   1   1   1   0

    1   1   1   0
vs
     1   1   0

You avoid this problem by keeping data only at leaves.  Another clear optimization:  So you do not waste nodes, there is no need going through a node with just one child.  This means the tree structure is a  2-tree with all nodes having 0 or 2 children, never 1. (DasGupta calls this a full tree) for a start at efficient, unambiguous decoding.  There are still a lot of 2-trees.

Two ways of looking at counts are important in forming an optimal 2-tree: 

1. Sum of all path lengths to leaves, with frequencies.  This show the two with the smallest frequencies should go at the bottom.

2.  Also label so all interior nodes except the root with a frequency which is the sum of their children.  Each number shows the number of paths that follow the leg from the node's parent to itself.  The sum of all path lengths with frequency is the sum of the frequency of use of every edge, which is the sum of the values at each node.  This allows the easy reduction to a smaller problem, replacing branch with two smallest, just with the parent node.


Consider “BAA BAA BLACK”
Character frequencies:

A 5
B 3
_  2 (use _ as a shorthand for blank)
C 1
L 1
K 1 

build tree, code, encode.  There are multiple choices when there is not  a unique smallest pair.
Huffman tree and codes

Python code for this is in huff.py, which uses previously discussed heap.py and a new module simpletable.py.

Class practice:
See another greedy problem:  Das Gupta 5.3 Horn formulas.  Important for Prolog programming language implementation.  The claim is there is a linear algorithm in size of input to solve Horn formulas.  In DasGupta's pseudo code, the while clause could be restated: "while there is an implication with true hypothesis and false conclusion, change the variable in the conclusion to true".   Class: work out such an O(n) algorithm.  This requires carefully consideration of th data structures used.

More creative problems from earlier sections:

1.   The diameter of a graph is the max of the distance between any two points.  In particular it makes sense for a tree.  Find an O(n) algorithm for a tree (general unrooted tree, and not binary tree).  May assume all edge weights 1 for simplicity (does not change the algorithm idea).

2.  Given n professional wrestlers, with r rivalries specified between pairs. Find an O(n+r) algorithm succeeding or acknowledging impossibility of declaring a partition into good guys and bad guys, and have all rivalries between a good guy and a bad guy.

3.  Create an O(n) algorithm and explain why the algorithm has that order:

boolean sumMatch(int[] A, int n, int val)
// n > 0; A is an array with n elements, and A[0] < A[1] < A[2] < … A[n-1].
// Return true if there are indices i and j such that A[i] + A[j] == val ; return false otherwise.
// Examples: If A contains -3, 1, 2, 5, 7, then sumMatch(A, 5, 6) is true (1+5 is 6), and
// sumMatch(A, 5, 12) is true but sumMatch(A, 5, 11) and sumMatch(A, 5, 0) are both false.

Shortest paths with negative weight edges

What if we allow negative edge weights?  Section 4.6 of DasGupta p 128.  Note this is part of the course not in Levitin.

Class:  See usefulness with DasGupta Ex 4.21, 22.

HW will include Das Gupta 4.8, 4.9


Next All pairs shortest paths and dynamic programming