Minimum Spanning Trees

Imagine you need to connect several cities with roads, and building each road has a different cost. You want to connect all cities using the minimum total road-building cost — this is the Minimum Spanning Tree (MST) problem. A spanning tree connects all vertices using exactly V-1 edges (no cycles), and the minimum spanning tree does so with the lowest total edge weight. This week covers two classic algorithms: Kruskal's algorithm (sort all edges by cost and greedily add the cheapest ones that do not form a cycle, using a clever Union-Find data structure) and Prim's algorithm (start from one vertex and keep growing the tree by adding the cheapest edge connecting the tree to a new vertex). You will also learn the 'cut property' — the mathematical principle that proves both algorithms are correct.

Learning Objectives

Implement Kruskal's algorithm using Union-Find (Disjoint Set) for efficient cycle detection — sort edges by weight and add them one by one if they connect two different components

Implement Prim's algorithm using a priority queue — very similar to Dijkstra's, but the key for each vertex is the cheapest edge connecting it to the growing tree, not the total path distance

Understand and apply the cut property: for any partition of the vertices into two groups, the lightest edge crossing between them must be in every MST

Know when an MST is unique: if all edge weights are distinct, the MST is guaranteed to be unique

Key Concepts

Union-Find (Disjoint Set) Data Structure

Before learning Kruskal's algorithm, we need a data structure that efficiently answers the question: 'Are these two elements in the same group?' The Union-Find (also called Disjoint Set Union or DSU) data structure solves this with two operations: Find(x) returns the 'representative' (root) of the group containing $x$ , and Union(x, y) merges the groups containing $x$ and $y$ into one.

The naive approach uses a forest of trees where each node points to its parent. Find follows parent pointers to the root, and Union links one root to another. Without optimization, trees can become tall chains, making Find slow — $O(n)$ in the worst case.

Two optimizations make it blazingly fast:

-
Union by rank: Always attach the shorter tree under the root of the taller tree. This keeps trees shallow — height is at most $O(\log n)$ .
-
Path compression: During Find, make every node on the path point directly to the root. This flattens the tree for future queries.

With both optimizations, each operation takes $O(\alpha(n))$ amortized time, where $\alpha$ is the inverse Ackermann function — a function that grows so incredibly slowly that for any practical input size (even $n = 10^{80}$ , the number of atoms in the universe), $\alpha(n) \leq 4$ . So in practice, each operation is effectively $O(1)$ .

Worked example — Union-Find on vertices {A, B, C, D, E}:

-Initially: each vertex is its own set: {A}, {B}, {C}, {D}, {E}
-Union(A, B): merge → {A, B}, {C}, {D}, {E}. Root = A (or B, depending on rank)
-Union(C, D): merge → {A, B}, {C, D}, {E}
-Find(B): returns A (B's root)
-Union(A, C): merge {A, B} and {C, D} → {A, B, C, D}, {E}
-Find(D): returns A (with path compression, D now points directly to A)

-
Path compression in Find: make every node on the search path point directly to the root — dramatically speeds up future queries by flattening the tree structure
-
Union by rank: always attach the shorter tree under the taller tree's root — keeps the tree height at $O(\log n)$ even without path compression
-
With both optimizations combined, each operation is $O(\alpha(n))$ amortized — where $\alpha(n) \leq 4$ for all practical input sizes, making operations effectively constant time
-
Used by Kruskal's algorithm for cycle detection: two vertices are in the same component (adding an edge would create a cycle) if and only if Find returns the same root for both

Kruskal's Algorithm

Kruskal's algorithm takes a straightforward approach: sort all edges in the graph from cheapest to most expensive, then go through them one by one. For each edge, ask: 'Would adding this edge create a cycle?' If no, add it to the MST. If yes, skip it. To efficiently check for cycles, Kruskal's uses the Union-Find (also called Disjoint Set) data structure, which tracks which vertices are already connected. Initially, each vertex is its own 'component.' When you add an edge, you merge the two components (union). Checking if two vertices are in the same component (find) is nearly O(1) with path compression and union-by-rank optimizations. The total time is dominated by sorting the edges: O(E log E).

-
Sorting edges takes O(E log E) — this is the dominant cost; everything else is nearly linear
-
Union-Find with path compression and union-by-rank makes each find/union operation nearly O(1) — technically O(α(n)) where α is the inverse Ackermann function, which is effectively constant
-
The algorithm adds exactly V-1 edges to form the spanning tree — any fewer and the graph would be disconnected, any more and there would be a cycle
-
Particularly good for sparse graphs (few edges relative to vertices) because the sort is over E edges rather than operating over V vertices

Prim's Algorithm

Prim's algorithm takes a different approach from Kruskal's: instead of considering edges globally, it grows the MST one vertex at a time from a starting point. Begin with any vertex, then repeatedly find the cheapest edge that connects the current tree to a vertex not yet in the tree, and add that vertex. This is very similar to how Dijkstra's algorithm works — the difference is that Dijkstra tracks the total distance from the source, while Prim tracks just the weight of the single edge connecting each vertex to the tree. Using a binary min-heap as a priority queue, Prim's runs in O((V+E) log V). For dense graphs with an adjacency matrix, a simpler O(V²) implementation (without a heap) is often used.

-
Grows a single tree from a starting vertex by always adding the cheapest edge connecting the tree to a new outside vertex
-
Very similar to Dijkstra's algorithm, but the 'key' of each vertex is the weight of the cheapest edge to the tree — not the total distance from the source
-
With a binary heap: O((V+E) log V) — same complexity as Dijkstra's, since the structure of the algorithm is nearly identical
-
Particularly good for dense graphs, where a simple O(V²) implementation (scanning all vertices to find the minimum key) avoids the overhead of a heap

Cut Property

The cut property is the mathematical foundation that proves both Kruskal's and Prim's algorithms are correct. A 'cut' is any way of dividing the graph's vertices into two non-empty groups (S and V-S). 'Crossing edges' are edges with one endpoint in each group. The cut property states: the lightest crossing edge for any cut must be part of every MST. Why? If we had an MST that did not include this lightest crossing edge, we could swap it in for a heavier crossing edge (which must exist since the MST spans all vertices) and get a cheaper tree — contradicting the assumption it was minimum. Both Kruskal's and Prim's greedily pick edges that are the lightest crossing some cut, which is why they produce correct MSTs.

-
A cut divides the vertex set V into two non-empty groups S and V-S — any such partition counts as a cut
-
A crossing edge has one endpoint in S and the other in V-S — these are the edges that 'bridge' the two groups
-
The light edge is the minimum-weight crossing edge for a given cut — the cut property guarantees this edge belongs to every MST
-
Both Kruskal's and Prim's algorithms always add edges that are 'light' for some cut, which is why they correctly produce an MST
-
If all edge weights are distinct (no ties), the MST is unique — there is only one possible minimum spanning tree

W8: Shortest Paths W10: Dynamic Programming