Algorithms
Union-Find
In Kruskal's algorithm, as edges are added, compontents change, so the
component structure is dynamic. A larger topic is dynamic data
structures, including B-trees, hash tables, lists....
See Minimum spanning trees for motivation for dynamically
changing
equivalence relations.
Illustrate with pictures on the board for motivation only for now.
Data structures: An in-tree: tree where children point
to parent, the reverse of the usual way a tree is referenced. There must be independent
pointers to the nodes, since they are not findable from the
root. Assume nodes for vertices 0..n-1. Have a parent
array, where -1 means root.
Identify components by in-tree root, union combines two distict trees by pointing root of one to the root of the other.
Use "Link operation": read or write to parent array as basis of counting.
A union-find program is a sequence of finding if two elements are in the same set and unions of distinct sets.
Union-find program on n elements creates n nodes (in O(n)) and does some sequence of m union and find operations.
In simple version above worst case n node creation steps followed by m operations of O(n): total O(n +mn) link operations.
(tricky with 2 variables - not both need to get large: If m = 0, need tthe "n +".)
Elaborations
wUnion: always append tree with smaller height to root of larger tree. Illustrate. How bad could the depth be?
Prove: A non-empty intree with k nodes constructed using
wUnion will have height h ≤ ⌊lg k⌋. (assuming depth of a
single node tree (root only) is 0....) Actually prove by induction the equivalent 2h <= k. (Why?)
Proof: By induction on h:
Basis:...
one node, height 0, 20 <= 1
induction step:
Assume any tree of height h has at least 2h vertices.
Prove and tree of height h+1 and at least 2h+1 vertices.
A tree of height h+1 must have been formed from two trees with height
h, and has the sum of the elements from each, which by hypothesis is at
least 2h vertices each; 2*2h = 2h+1.
Theorem: wUnion-find program uses O(n + m lg n) link operations
Note that an additional data structure is needed: the height array at each root (values ignored for non-roots)
cFind
Compress
the tree while finding: cFind(x) makes x and all ancestors link
directly to the root. Draw in class. See DasGupta.
Can add this to wUnion, but need to change names: height
replaced by 'rank', using the same algorithm, though actual height may
be much less than the rank.
The lowest growing function you will ever see: log * (# of times you must apply lg to get result <= 0):
n | 0 | 1 | 2 to 3 | 4 to15 | 16 to 65535 | 65535 to 265535 - 1 |
log* n | 0 | 1 | 2 | 3 | 4 | 5 |
Theorem: Using both cFind and wUnion in a union find program, the total number of link operations is O((n+m)lg* n).
Proofs use the log* function and amortization. The
notes refer to a reference for the proof, and DasGupta also has one (which I find rather incomplete), You are NOT
responsible for these.
You are responsible for simple find and union,
cFind and wUnion,
and the analysis when cFind is not used. See unionfind.py.