Complexity Theory & NP-Completeness

This final week explores one of the deepest questions in all of computer science: are there problems that are easy to check but fundamentally hard to solve? We know many problems (like sorting or shortest paths) can be solved quickly — these belong to class P (solvable in polynomial time). But there is a huge collection of problems (like the Traveling Salesman Problem, scheduling, or Sudoku) where we can easily verify a given solution, but nobody has found a fast algorithm to find one. These belong to class NP (verifiable in polynomial time). The million-dollar question — literally, it carries a $1,000,000 Clay Millennium Prize — is whether P equals NP. This week you will learn about NP-completeness: a special set of problems that are the 'hardest' problems in NP, in the sense that solving any one of them quickly would solve ALL NP problems quickly. Understanding this theory is essential for recognizing when a problem is likely too hard for any efficient algorithm.

Learning Objectives

Define the complexity classes P (problems solvable in polynomial time), NP (problems whose solutions can be verified in polynomial time), and NP-complete (the hardest problems in NP)

Understand polynomial-time reductions: a formal way of showing that one problem is at least as hard as another by transforming instances of one into instances of the other

Know the key NP-complete problems: SAT (can a Boolean formula be made true?), 3-SAT, Clique, Vertex Cover, Hamiltonian Cycle, and the Traveling Salesman Problem

Understand the Cook-Levin theorem (SAT was the first problem proven NP-complete) and its significance: it created the foundation for proving thousands of other problems NP-complete through reductions

Key Concepts

P vs NP

P is the class of problems that can be solved quickly (in polynomial time, like O(n²) or O(n³)). Examples include sorting, shortest paths, and matrix multiplication. NP is the class of problems where, if someone hands you a proposed solution (called a 'certificate'), you can verify whether it is correct in polynomial time. For example, if someone claims 'here is a route that visits all cities for under $1000,' you can easily check it by adding up the distances. But finding such a route from scratch might take exponential time. The big question is: does being easy to verify (NP) automatically mean being easy to solve (P)? If P = NP, then every problem you can quickly check could also be quickly solved — this would revolutionize cryptography, optimization, AI, and many other fields. Most experts believe P ≠ NP (that solving is genuinely harder than verifying), but nobody has been able to prove it.

-
P ⊆ NP is a proven fact: any problem you can solve quickly, you can obviously verify quickly too (just solve it and check the answer matches)
-
NP does NOT mean 'not polynomial' — it stands for 'nondeterministic polynomial time.' A problem is in NP if, given a proposed solution (certificate), you can verify it in polynomial time
-
If P = NP, it would mean efficient algorithms exist for all NP problems — this would break most modern cryptography (which relies on certain problems being hard) and solve many optimization problems
-
Most computer scientists believe P ≠ NP, but no proof exists after 50+ years of attempts — proving it either way carries a $1,000,000 Clay Millennium Prize
-
Even if P ≠ NP, NP problems can still be solved — just not efficiently. Exponential algorithms, approximation algorithms, and heuristics are used in practice

NP-Completeness & Reductions

NP-complete problems are the 'hardest' problems within NP — they are the ones that every other NP problem can be transformed into. If you could solve any single NP-complete problem in polynomial time, you could solve ALL of NP in polynomial time (proving P = NP). The key tool is 'polynomial-time reduction': transforming one problem into another in polynomial time. If you can transform problem A into problem B quickly, and you can solve B quickly, then you can solve A quickly too (by first transforming, then solving). Conversely, if A is known to be hard and you can reduce A to B, then B must be at least as hard as A. To prove a new problem X is NP-complete, you need two steps: (1) show X is in NP (solutions can be verified quickly), and (2) take a known NP-complete problem and reduce it to X (showing X is at least as hard as the hardest NP problems).

-
Polynomial-time reduction (A ≤p B): transform any instance of problem A into an equivalent instance of problem B in polynomial time — if B is solvable quickly, then A is too
-
Reductions show relative difficulty: if A (known hard) reduces to B, then B must be at least as hard as A — you cannot solve B without effectively solving A
-
Conversely: if B is easy and A reduces to B, then A must also be easy — reductions transfer both hardness and easiness
-
NP-hard means 'at least as hard as NP-complete' — but an NP-hard problem might not even be in NP (its solutions might not be verifiable in polynomial time)
-
To prove problem X is NP-complete: Step 1 — show X is in NP (describe a polynomial-time verifier). Step 2 — take a known NP-complete problem and reduce it TO X (show a polynomial-time transformation)
-
⚠ Common mistake: Getting the reduction direction WRONG. To prove X is NP-complete, you must reduce a known NP-complete problem TO X (i.e., KNOWN ≤p X). Reducing X to a known problem (X ≤p KNOWN) shows X is in NP, NOT that it is NP-complete! The direction matters enormously.
-
⚠ Common mistake: Confusing NP with 'not polynomial' or 'exponential'. NP means 'Nondeterministic Polynomial' — the class of problems whose solutions can be verified in polynomial time. Every problem in P is also in NP (P ⊆ NP).

Key NP-Complete Problems

The Cook-Levin theorem (1971) proved that SAT (Boolean satisfiability) is NP-complete — this was the first problem ever shown to be NP-complete and opened the floodgates for proving thousands of other problems NP-complete through reductions. SAT asks: given a Boolean formula with variables that can be true or false and operators AND, OR, NOT, is there an assignment of values that makes the whole formula true? From SAT, researchers proved 3-SAT is NP-complete (SAT where each clause has exactly 3 variables), and from there proved Clique, Vertex Cover, Independent Set, Hamiltonian Cycle, Traveling Salesman, and many more. These problems form a web of reductions — each one can be transformed into others in polynomial time. Recognizing that a new problem is NP-complete tells you not to waste time looking for an efficient exact algorithm, and to instead consider approximation algorithms or heuristics.

-
SAT (Boolean Satisfiability): given a formula like (x₁ OR NOT x₂) AND (x₂ OR x₃), is there an assignment of true/false values that makes it true? This was the first NP-complete problem (Cook-Levin theorem, 1971)
-
3-SAT: a restricted version of SAT where every clause has exactly 3 literals — still NP-complete, and often used as the starting point for reductions to other problems
-
Clique: given a graph G, does it contain a complete subgraph (every pair connected) of size k? Think of it as finding a group of k people who all know each other in a social network
-
Vertex Cover: can you select k vertices such that every edge in the graph has at least one endpoint in your selection? This is the complement of Independent Set
-
Hamiltonian Cycle: does the graph have a cycle that visits every vertex exactly once? Similar to the Traveling Salesman Problem but without edge weights
-
TSP (Traveling Salesman, decision version): is there a tour visiting all cities with total travel cost at most k? The optimization version (find the cheapest tour) is NP-hard
-
All these problems are connected through a chain of polynomial-time reductions: SAT → 3-SAT → Clique → Vertex Cover → Hamiltonian Cycle → TSP, showing they are all equivalently hard

W11: Greedy & Amortized Analysis