Searching & Selection
How do you find a specific item in a large collection of data? If the data is sorted, you can use Binary Search — a technique that eliminates half of the remaining possibilities with each step, finding any item in a list of a million elements with just about 20 comparisons. But what if you do not need to find a specific value, but rather need the 'k-th smallest' — for example, the median (middle value) of a dataset? This is called the selection problem. This week covers Binary Search in depth, then introduces QuickSelect (which cleverly reuses the partitioning trick from QuickSort to find the k-th element in expected O(n) time) and the Median of Medians algorithm (a more complex method that guarantees O(n) time even in the worst case). These techniques are foundational tools used throughout computer science and data analysis.
Learning Objectives
Key Concepts
Binary Search
Binary Search is one of the most elegant and efficient algorithms in all of computer science. It works on a sorted list by looking at the middle element first. If the middle element is your target — great, you are done. If your target is smaller, you know it must be in the left half, so you throw away the right half entirely. If it is larger, throw away the left half. Then repeat on the remaining half. Each step cuts the search space in half, which means even for a list of one billion items, Binary Search needs at most about 30 comparisons. This is time — dramatically faster than scanning through every element one by one (which would be ).
- -
The array must be sorted first — Binary Search does not work on unsorted data because it relies on the ordering to decide which half to discard
- -
Each step halves the search space, so the total number of steps is log₂(n) — for 1,000,000 items, that is only about 20 comparisons
- -
The iterative version (using a while loop) uses O(1) extra space; the recursive version uses O(log n) space on the call stack because each recursive call adds a frame
QuickSelect
What if you need to find the 5th smallest number in a big unsorted list? You could sort the list first and then pick position 5, but sorting takes O(n log n). QuickSelect is faster: it borrows the 'partition' trick from QuickSort. You pick a random element (called the pivot), rearrange the array so everything smaller than the pivot is on its left and everything larger is on its right. Now the pivot is in its final sorted position. If that position happens to be the k-th spot, you are done. If k is to the left, repeat only on the left portion. If k is to the right, repeat only on the right. Because you only recurse into ONE half (not both, like QuickSort does), the expected running time is O(n) — linear! The catch is that with an unlucky pivot choice, it can degrade to O(n²), but this is extremely rare with random pivot selection.
- -
Only recurses into one half of the array (unlike QuickSort which recurses into both halves) — this is what makes it faster than sorting
- -
Expected running time is O(n) with a randomly chosen pivot — on average, each recursion reduces the problem to about half its size
- -
Worst case is O(n²) if the pivot is consistently the smallest or largest element (extremely rare with random selection)
- -
Space complexity is average due to the recursion stack (one recursive call per level, levels on average). Worst case uses stack space. An iterative (tail-recursive) version can achieve extra space.
Median of Medians
QuickSelect is fast on average, but what if you absolutely need a guarantee that it will be fast every time, even in the worst case? The Median of Medians algorithm provides exactly that. The idea is to be very careful about choosing the pivot so that it is always 'reasonably centered.' Here is how it works: (1) divide all elements into small groups of 5, (2) find the median (middle value) of each group (easy since each group has only 5 elements), (3) collect all those medians and recursively find the median of THOSE medians — that becomes your pivot. This carefully chosen pivot guarantees that at least 30% of elements are on each side, ensuring the search space shrinks by a significant fraction every time. The resulting recurrence T(n) = T(n/5) + T(7n/10) + O(n) solves to O(n). In practice, the constant factors make this slower than random QuickSelect, so it is mainly important for theoretical purposes — proving that linear-time selection is possible.
- -
Divide all elements into small groups of 5, then find the median of each group (sorting 5 elements is trivial)
- -
Recursively find the median of all those group medians — this becomes the pivot for partitioning
- -
This pivot guarantees at least a 70-30 split at worst, meaning at least 30% of elements are eliminated each step
- -
Recurrence: T(n) = T(n/5) + T(7n/10) + O(n) = O(n) — both recursive calls shrink the problem because n/5 + 7n/10 = 9n/10 < n
- -
Mainly of theoretical importance — it proves O(n) worst-case selection is possible, but in practice QuickSelect with random pivot is faster due to smaller constant factors