Graph Basics & Traversals

Graphs are one of the most versatile data structures in computer science. A graph is simply a collection of 'nodes' (also called vertices) connected by 'edges' — think of a social network (people are nodes, friendships are edges), a road map (cities are nodes, roads are edges), or the internet (websites are nodes, links are edges). This week introduces how to store graphs in a computer's memory and two fundamental ways to explore them: Breadth-First Search (BFS), which explores level by level like ripples spreading outward from a stone thrown in water, and Depth-First Search (DFS), which dives deep into one path before backtracking. BFS is great for finding shortest paths in unweighted graphs, while DFS is powerful for detecting cycles, classifying edges, and performing topological sorting (ordering tasks that have dependencies).

Learning Objectives

Represent graphs in two ways: adjacency lists (a list of neighbors for each node — space-efficient for sparse graphs) and adjacency matrices (a V×V grid — fast for checking if an edge exists)

Implement BFS (which uses a queue to explore nodes level by level) and DFS (which uses recursion or a stack to explore as deep as possible before backtracking)

Classify edges during DFS into four types: tree edges (part of the DFS tree), back edges (point to an ancestor, indicating a cycle), forward edges, and cross edges

Apply topological sort to Directed Acyclic Graphs (DAGs) — ordering nodes so that every directed edge goes from earlier to later in the order, useful for scheduling tasks with prerequisites

Key Concepts

Graph Representations

A graph G = (V, E) consists of vertices (V) and edges (E). The first question is: how do we store a graph in memory? There are two main approaches. An adjacency list stores, for each vertex, a list of its neighbors — imagine each person in a social network having a contact list. This uses O(V+E) space and is great when the graph is 'sparse' (has relatively few edges). An adjacency matrix is a V×V grid where entry [i][j] is 1 if there is an edge from vertex i to vertex j, and 0 otherwise. This uses O(V²) space regardless of the number of edges, but lets you check 'is there an edge between A and B?' in O(1) time. The choice of representation significantly affects the speed of graph algorithms.

-
Adjacency list: each vertex has a list of its neighbors, using O(V+E) total space — memory-efficient for sparse graphs where most pairs of nodes are not connected
-
Adjacency matrix: a V×V grid of 0s and 1s, using O(V²) space — allows O(1) edge lookups but wastes space when few edges exist
-
Sparse graph (E is much less than V²): adjacency list is better because it only stores edges that actually exist
-
Dense graph (E is close to V²): adjacency matrix may be preferable because the constant-time edge lookup outweighs the space cost

Breadth-First Search (BFS)

BFS explores a graph layer by layer, like ripples spreading from a stone dropped in water. Starting from a source vertex, it first visits all immediate neighbors (distance 1), then all vertices that are 2 edges away, then 3 edges away, and so on. It uses a queue (first-in, first-out) to keep track of which vertices to visit next. This layer-by-layer exploration naturally finds the shortest path (in terms of number of edges) from the source to every reachable vertex. BFS runs in O(V+E) time because it visits each vertex once and examines each edge once. It is the foundation for many practical algorithms, including finding connected components and checking if a graph is bipartite.

-
Uses a FIFO (first-in, first-out) queue — vertices discovered earlier are explored first, ensuring the layer-by-layer order
-
Finds shortest paths (fewest edges) in unweighted graphs — the distance to each vertex is set when it is first discovered, and this is guaranteed to be the minimum
-
Visits vertices in non-decreasing order of distance from the source — all distance-1 vertices before distance-2, and so on
-
Runs in O(V+E) time using an adjacency list — each vertex enters the queue at most once, and each edge is examined at most once (or twice in undirected graphs)

Depth-First Search (DFS)

DFS takes the opposite approach from BFS: instead of exploring layer by layer, it dives as deep as possible along one path before backtracking. Think of exploring a maze by always turning left until you hit a dead end, then backing up and trying the next option. DFS uses recursion (or an explicit stack) and assigns each vertex a 'discovery time' (when it was first found) and a 'finish time' (when all its descendants have been fully explored). As DFS runs, it classifies every edge into one of four types: tree edges (the edges that form the DFS tree), back edges (edges pointing back to an ancestor — these indicate cycles), forward edges (edges to a descendant already discovered), and cross edges (edges to a vertex in a different branch). DFS is the foundation for topological sorting and finding strongly connected components.

-
Uses recursion (which implicitly creates a stack of function calls) — each recursive call explores one branch deeper before returning
-
Edge classification depends on the color of the destination: WHITE → tree edge (new vertex), GRAY → back edge (ancestor, means a cycle exists!), BLACK → forward or cross edge
-
The presence of a back edge means the graph contains a cycle — this is how we detect cycles using DFS
-
Topological sort: output vertices in reverse order of their finish times — this guarantees that if there is an edge from A to B, then A appears before B in the ordering
-
Each vertex gets a discovery time d[u] (when DFS first reaches it) and finish time f[u] (when DFS has fully explored all its descendants) — these timestamps are useful for many graph analyses
-
⚠ Common mistake: In undirected graphs, DFS only produces tree edges and back edges — there are no forward or cross edges. Forward and cross edges only appear in DFS of directed graphs.

Strongly Connected Components (Kosaraju's Algorithm)

In a directed graph, a strongly connected component (SCC) is a maximal set of vertices where every vertex can reach every other vertex via directed paths. For example, in a web graph, an SCC represents a group of pages where you can navigate from any page to any other within the group. Finding SCCs reveals the high-level structure of a directed graph.

Kosaraju's algorithm finds all SCCs in two DFS passes:

-First pass: Run DFS on the original graph $G$ , recording the finish times of all vertices.
-Transpose: Create $G^T$ by reversing all edge directions.
-Second pass: Run DFS on $G^T$ , but process vertices in decreasing order of finish time from step 1. Each DFS tree in this pass is one SCC.

Why it works: The first DFS orders vertices so that 'source' SCCs (ones with edges going out to other SCCs) finish last. When we reverse the graph and process in reverse finish order, we start from these source SCCs, but in the reversed graph their outgoing edges are now incoming — so the DFS cannot 'escape' into other SCCs. Each DFS tree is therefore confined to exactly one SCC.

Worked example: Consider vertices {A, B, C, D} with edges A→B, B→C, C→A, B→D.

-SCCs: {A, B, C} (they form a cycle) and {D} (alone)
-First DFS (start at A): discovers A→B→C→(back to A)→D. Finish order: C, D, B, A
-Transpose: reverse all edges. Now C→B, A→C, B→A, D→B
-Second DFS in reverse finish order (A first): A→C→B, that's SCC {A,B,C}. Then D alone, that's SCC {D}.

Total time: $O(V + E)$ — just two DFS passes and one graph transposition.

-
Two DFS passes: the first records finish times on the original graph, the second discovers SCCs on the transposed graph in reverse finish order
-
The transposed graph $G^T$ has the same SCCs as $G$ — reversing edges does not change which vertices can reach each other within a component
-
The SCC decomposition forms a DAG (the 'condensation graph') — this is useful for many problems that reduce to DAG algorithms once SCCs are contracted
-
Alternative: Tarjan's algorithm finds SCCs in a single DFS pass using a stack and 'lowlink' values, but Kosaraju's is often easier to understand and implement

W6: Hash Tables W8: Shortest Paths