See Minimum spanning trees for motivation for dynamically changing equivalence relations.

Illustrate with pictures on the board for motivation only for now.

Data structures: An in-tree: tree where children point to parent, the reverse of the usual way a tree is referenced. There must be independent pointers to the nodes, since they are not findable from the root. Assume nodes for vertices 0..n-1. Have a parent array, where -1 means root.

Identify components by in-tree root, union combines two distict trees by pointing root of one to the root of the other.

Use "Link operation": read or write to parent array as basis of counting.

A union-find program is a sequence of finding if two elements are in the same set and unions of distinct sets.

Union-find program on n elements creates n nodes (in O(n)) and does some sequence of m union and find operations.

In simple version above worst case n node creation steps followed by m operations of O(n): total O(n +mn) link operations.

(tricky with 2 variables - not both need to get large: If m = 0, need tthe "n +".)

Prove: A non-empty intree with k nodes constructed using wUnion will have height h ≤ ⌊lg k⌋. (assuming depth of a single node tree (root only) is 0....) Actually prove by induction the equivalent 2

Proof: By induction on h:

Basis:...

one node, height 0, 2

induction step:

Assume any tree of height h has at least 2

Prove and tree of height h+1 and at least 2

A tree of height h+1 must have been formed from two trees with height h, and has the sum of the elements from each, which by hypothesis is at least 2

Theorem: wUnion-find program uses O(n + m lg n) link operations

Note that an additional data structure is needed: the height array at each root (values ignored for non-roots)

cFind

Compress the tree while finding: cFind(x) makes x and all ancestors link directly to the root. Draw in class. See DasGupta.

Can add this to wUnion, but need to change names: height replaced by 'rank', using the same algorithm, though actual height may be much less than the rank.

The lowest growing function you will ever see: log * (# of times you must apply lg to get result <= 0):

n | 0 | 1 | 2 to 3 | 4 to15 | 16 to 65535 | 65535 to 2^{65535} - 1 |

log* n | 0 | 1 | 2 | 3 | 4 | 5 |

Theorem: Using both cFind and wUnion in a union find program, the total number of link operations is O((n+m)lg* n).

Proofs use the log* function and amortization. The notes refer to a reference for the proof, and DasGupta also has one (which I find rather incomplete), You are NOT responsible for these.

You are responsible for simple find and union, cFind and wUnion, and the analysis when cFind is not used. See unionfind.py.