Cache Design
The cache is the most critical level of the memory hierarchy. We study cache organization (direct-mapped, set-associative, fully-associative), address decomposition into tag/index/offset, write policies, replacement policies, and the AMAT formula for performance analysis.
Learning Objectives
Key Concepts
Cache Organization
Caches are organized by their mapping from memory addresses to cache locations:
- -Direct-mapped: each address maps to exactly one cache line (index = address mod #sets)
- -N-way set-associative: each address maps to a set of N lines (choice within set)
- -Fully-associative: any address can go in any line (most flexible, most expensive)
- -
Direct-mapped: simple, fast, but suffers from conflict misses
- -
Set-associative: reduces conflict misses at cost of comparison hardware
- -
Fully-associative: no conflict misses but requires comparing all tags
- -
Number of sets = Cache size / (Block size × Associativity)
Address Decomposition
A memory address is split into three fields for cache access:
- -Offset (lowest bits): selects the byte within a cache block. Bits =
- -Index (middle bits): selects the cache set. Bits =
- -Tag (highest bits): compared against stored tag to determine hit/miss. Bits = address bits − index bits − offset bits
- -
Offset bits = log₂(block size in bytes)
- -
Index bits = log₂(number of sets)
- -
Tag bits = total address bits − index bits − offset bits
- -
On access: use index to find set, compare tag with all tags in set → hit or miss
Write Policies & Replacement
Write policies determine how writes to cache are handled:
- -Write-through: every write goes to both cache and memory. Simple but high memory traffic.
- -Write-back: writes only update the cache. Modified (dirty) blocks written to memory only when evicted.
Replacement policies for set-associative caches: LRU (evict least recently used), FIFO, Random.
- -
Write-through + write buffer: buffer absorbs writes so CPU doesn't wait
- -
Write-back reduces memory traffic but requires dirty bit per block
- -
Write-allocate: on write miss, fetch block into cache then write (common with write-back)
- -
No-write-allocate: on write miss, write directly to memory (common with write-through)
- -
LRU has best hit rate but is expensive for high associativity