Algorithms

Union-Find

In Kruskal's algorithm, as edges are added, compontents change, so the component structure is dynamic.  A larger topic is dynamic data structures, including B-trees, hash tables, lists....

See Minimum spanning trees for motivation for dynamically changing equivalence relations.
Illustrate with pictures on the board for motivation only for now.


Data structures:  An in-tree:  tree where children point to parent, the reverse of the usual way a tree is referenced.  There must be independent pointers to the nodes, since they are not findable from the root.   Assume nodes for vertices 0..n-1.  Have a parent array, where -1 means root.

Identify components by in-tree root, union combines two distict trees by pointing root of one to the root of the other.

Use "Link operation":  read or write to parent array as basis of counting.

A union-find program is a sequence of finding if two elements are in the same set and unions of distinct sets.

Union-find program on n elements creates n nodes (in O(n)) and  does some sequence of m union and find operations.
In simple version above worst case n node creation steps followed by m operations of O(n):  total O(n +mn) link operations.
(tricky with 2 variables - not both need to get large:  If m = 0, need tthe "n +".)

Elaborations  

wUnion: always append tree with smaller height to root of larger tree.  Illustrate.  How bad could the depth be?
Prove:  A non-empty intree with k nodes constructed using wUnion will have height h ≤  ⌊lg k⌋.  (assuming depth of a single node tree (root only) is 0....)  Actually prove by induction the equivalent 2h <= k. (Why?)

Proof:  By induction on h:
Basis:...
one node, height 0, 20 <= 1

induction step: 
Assume any tree of height h has at least 2h vertices.
Prove and tree of height h+1 and at least 2h+1 vertices.
A tree of height h+1 must have been formed from two trees with height h, and has the sum of the elements from each, which by hypothesis is at least 2h vertices each; 2*2h = 2h+1.

Theorem:  wUnion-find program uses O(n + m lg n) link operations

Note that an additional data structure is needed:  the height array at each root (values ignored for non-roots)

cFind
Compress the tree while finding: cFind(x)  makes x and all ancestors link directly to the root.  Draw in class.  See DasGupta.

Can add this to wUnion, but need to change names:  height replaced by 'rank', using the same algorithm, though actual height may be much less than the rank.

The lowest growing function you will ever see:  log * (# of times you must apply lg to get result <= 0):
n012 to 34 to1516 to 6553565535 to 265535 - 1
log* n012345

Theorem:  Using both cFind and wUnion in a union find program, the total number of link operations is O((n+m)lg* n).

Proofs use the log* function and amortization.  The notes refer to a reference for the proof, and DasGupta also has one (which I find rather incomplete), You are NOT responsible for these.

You are responsible for  simple find and union, cFind and wUnion, and the analysis when cFind is not used.  See unionfind.py.