Cache Design

The cache is the most critical level of the memory hierarchy. We study cache organization (direct-mapped, set-associative, fully-associative), address decomposition into tag/index/offset, write policies, replacement policies, and the AMAT formula for performance analysis.

Learning Objectives

Decompose memory addresses into tag, index, and offset fields

Trace cache accesses to determine hits and misses

Calculate AMAT (Average Memory Access Time)

Compare write-through vs write-back and various replacement policies

Key Concepts

Cache Organization

Caches are organized by their mapping from memory addresses to cache locations:

-Direct-mapped: each address maps to exactly one cache line (index = address mod #sets)
-N-way set-associative: each address maps to a set of N lines (choice within set)
-Fully-associative: any address can go in any line (most flexible, most expensive)

-
Direct-mapped: simple, fast, but suffers from conflict misses
-
Set-associative: reduces conflict misses at cost of comparison hardware
-
Fully-associative: no conflict misses but requires comparing all tags
-
Number of sets = Cache size / (Block size × Associativity)

Address Decomposition

A memory address is split into three fields for cache access:

-Offset (lowest bits): selects the byte within a cache block. Bits = $\log_2(\text{block size})$
-Index (middle bits): selects the cache set. Bits = $\log_2(\text{number of sets})$
-Tag (highest bits): compared against stored tag to determine hit/miss. Bits = address bits − index bits − offset bits

-
Offset bits = log₂(block size in bytes)
-
Index bits = log₂(number of sets)
-
Tag bits = total address bits − index bits − offset bits
-
On access: use index to find set, compare tag with all tags in set → hit or miss

Write Policies & Replacement

Write policies determine how writes to cache are handled:

-Write-through: every write goes to both cache and memory. Simple but high memory traffic.
-Write-back: writes only update the cache. Modified (dirty) blocks written to memory only when evicted.

Replacement policies for set-associative caches: LRU (evict least recently used), FIFO, Random.

-
Write-through + write buffer: buffer absorbs writes so CPU doesn't wait
-
Write-back reduces memory traffic but requires dirty bit per block
-
Write-allocate: on write miss, fetch block into cache then write (common with write-back)
-
No-write-allocate: on write miss, write directly to memory (common with write-through)
-
LRU has best hit rate but is expensive for high associativity

W7: Memory Technology W9: Virtual Memory