Dynamic Programming

Dynamic Programming (DP) is one of the most powerful problem-solving techniques in computer science, and also one of the trickiest to learn. The core idea is surprisingly simple: if a big problem can be broken into smaller subproblems, and those smaller subproblems overlap (meaning the same subproblem appears over and over), then instead of solving each one from scratch every time, solve it once and save the result for later. This avoids a massive amount of redundant work. For example, computing the 50th Fibonacci number naively requires billions of recursive calls, but with DP it takes just 50 steps. This week covers the two approaches to DP — memoization (top-down: start with the big problem, recurse down, and cache results) and tabulation (bottom-up: fill in a table starting from the smallest subproblems). You will apply DP to three classic problems: the 0/1 Knapsack, the Longest Common Subsequence, and Matrix Chain Multiplication.

Learning Objectives

Identify when a problem is suitable for DP by checking for two properties: optimal substructure (the best solution is built from best solutions to smaller pieces) and overlapping subproblems (the same smaller pieces appear many times)

Implement both DP approaches: memoization (recursion with a cache — easy to write but may use more stack space) and tabulation (iteratively filling a table — usually more efficient)

Solve the 0/1 Knapsack problem (maximize value of items in a bag with limited weight), the Longest Common Subsequence problem (find the longest shared subsequence of two strings), and Matrix Chain Multiplication

Analyze time and space complexity of DP solutions and understand why Knapsack's O(n·W) is 'pseudo-polynomial' — it depends on the value of W, not just the number of items

Key Concepts

DP Principles

Dynamic programming works when two conditions are met: (1) Optimal substructure — the optimal solution to the big problem contains optimal solutions to its subproblems (for example, the shortest path from A to C through B must include the shortest path from A to B). (2) Overlapping subproblems — when you break the problem down recursively, the same subproblems appear over and over (unlike divide-and-conquer, where subproblems are independent). When both conditions hold, DP avoids redundant work by storing subproblem results. There are two ways to implement DP: memoization (top-down) starts with the full problem, recursively breaks it down, and caches results so each subproblem is solved only once. Tabulation (bottom-up) fills in a table starting from the smallest subproblems, building up to the answer. Both achieve the same result but have different practical trade-offs.

-
Optimal substructure: the best overall solution is built from the best solutions to its subproblems — if this is not true, DP will not give the correct answer
-
Overlapping subproblems: the same smaller problems appear many times during the recursive breakdown — this is what makes caching (memoization) or table-filling (tabulation) so effective
-
Memoization (top-down): write a normal recursive solution, add a cache (dictionary/array), and before solving any subproblem, check if you have already solved it — only computes subproblems that are actually needed
-
Tabulation (bottom-up): fill in a table iteratively starting from the smallest base cases — usually more space-efficient and avoids recursion overhead, but computes all subproblems even if some are not needed
-
⚠ Common mistake: Forgetting base cases! Every DP recurrence needs base cases (e.g., dp[0][w] = 0 for Knapsack, dp[0][j] = 0 for LCS). Without them, the recurrence has no ground to stand on.
-
⚠ Common mistake: Defining the wrong subproblem. If your DP doesn't capture enough state, you'll get wrong answers. Ask: 'What information do I need to make the optimal decision at this step?' — that tells you what your DP dimensions should be.

0/1 Knapsack

Imagine you are a thief with a backpack (knapsack) that can hold at most W kilograms. There are n items in a room, each with a specific weight and value. You want to maximize the total value of items you take without exceeding the weight limit. The catch: for each item, you either take it entirely or leave it — no cutting items in half (that is why it is called '0/1'). The naive approach would try all 2^n possible combinations, which is exponentially slow. DP solves this efficiently by building a table: dp[i][w] represents the maximum value achievable using the first i items with a weight limit of w. For each item, you have two choices: skip it (dp[i][w] = dp[i-1][w]) or take it (dp[i][w] = value[i] + dp[i-1][w - weight[i]]). You pick whichever gives more value. This fills an (n+1) x (W+1) table in O(n*W) time.

-
The O(n·W) running time is called 'pseudo-polynomial' — it depends on the numeric value of W, not just the size of the input. If W is huge (like 10 billion), the algorithm becomes impractical even for few items
-
Space optimization: instead of keeping the full 2D table, you only need the previous row to compute the current row, reducing space from O(n·W) to O(W)
-
For each item, the decision is binary: include it (add its value but reduce remaining capacity) or skip it (keep the previous best) — you take whichever option gives more value

Longest Common Subsequence (LCS)

Given two sequences (like two strings or two DNA strands), the LCS problem asks: what is the longest sequence of characters that appears in both, in the same order, but not necessarily consecutively? For example, the LCS of 'ABCBDAB' and 'BDCAB' is 'BCAB' (length 4). Note that a subsequence is different from a substring — characters do not need to be adjacent, they just need to maintain their relative order. The DP approach builds a 2D table where dp[i][j] represents the LCS length of the first i characters of string X and the first j characters of string Y. If X[i] matches Y[j], we extend the previous LCS by 1 (dp[i][j] = dp[i-1][j-1] + 1). If they do not match, we take the better of skipping either character (dp[i][j] = max(dp[i-1][j], dp[i][j-1])). This runs in O(m·n) time where m and n are the string lengths.

-
A subsequence is NOT the same as a substring — characters do not need to be adjacent, they just need to appear in the same relative order (e.g., 'ACE' is a subsequence of 'ABCDE')
-
When characters match (X[i] == Y[j]): dp[i][j] = dp[i-1][j-1] + 1 — we found one more common character, so extend the LCS by 1
-
When characters differ (X[i] != Y[j]): dp[i][j] = max(dp[i-1][j], dp[i][j-1]) — take the better result from either skipping a character in X or skipping one in Y
-
To reconstruct the actual LCS (not just its length), backtrack through the table: at each cell, check whether the characters matched (go diagonal) or which direction gave the max (go up or left)

Matrix Chain Multiplication

When multiplying a chain of matrices $A_1 \times A_2 \times \cdots \times A_n$ , the order of multiplications (parenthesization) does not change the result but dramatically affects the number of scalar multiplications needed. For example, multiplying three matrices $A_1(10 \times 30)$ , $A_2(30 \times 5)$ , $A_3(5 \times 60)$ can be done as:

- $((A_1 A_2) A_3)$ : $10 \times 30 \times 5 + 10 \times 5 \times 60 = 1500 + 3000 =$ 4,500 multiplications
- $(A_1 (A_2 A_3))$ : $30 \times 5 \times 60 + 10 \times 30 \times 60 = 9000 + 18000 =$ 27,000 multiplications

That is a 6x difference! With more matrices, the gap grows even larger. The DP approach finds the optimal parenthesization.

Let $dp[i][j]$ = minimum cost to multiply matrices $A_i$ through $A_j$ . We try every possible split point $k$ between $i$ and $j$ : multiply $A_i \ldots A_k$ and $A_{k+1} \ldots A_j$ separately, then multiply the two results together. The cost of the final multiplication is $p_{i-1} \times p_k \times p_j$ (where the dimensions are stored in array $p$ ).

Worked example — dimensions $p = [10, 30, 5, 60]$ :

	$j=1$	$j=2$	$j=3$
$i=1$	0	1500	4500
$i=2$	—	0	9000
$i=3$	—	—	0

- $dp[1][2] = p_0 \cdot p_1 \cdot p_2 = 10 \cdot 30 \cdot 5 = 1500$
- $dp[2][3] = p_1 \cdot p_2 \cdot p_3 = 30 \cdot 5 \cdot 60 = 9000$
- $dp[1][3] = \min(dp[1][1] + dp[2][3] + 10 \cdot 30 \cdot 60,\; dp[1][2] + dp[3][3] + 10 \cdot 5 \cdot 60) = \min(0 + 9000 + 18000,\; 1500 + 0 + 3000) = \min(27000, 4500) =$ 4500

-
The number of possible parenthesizations grows exponentially (Catalan numbers), so brute force is impractical — DP reduces this to $O(n^3)$
-
Fill the table by chain length: first all chains of length 2, then length 3, etc. — this ensures all subproblems are solved before they are needed
-
Multiplying a $p \times q$ matrix by a $q \times r$ matrix requires $p \cdot q \cdot r$ scalar multiplications — this is the cost of combining two sub-chains
-
To recover the actual optimal parenthesization, store the split point $k$ that gave the minimum at each cell, then recursively reconstruct

W9: Minimum Spanning Trees W11: Greedy & Amortized Analysis