Advanced Pipeline & Hazard Resolution

This week we learn techniques to resolve pipeline hazards without stalling: data forwarding (bypassing) and instruction scheduling. We also study the unavoidable load-use hazard that still requires one stall cycle even with full forwarding.

Learning Objectives

Explain how data forwarding eliminates most data hazard stalls

Identify forwarding paths: EX/MEM→EX, MEM/WB→EX

Recognize load-use hazards that require a mandatory stall

Reorder instructions to avoid stalls (pipeline scheduling)

Key Concepts

Data Forwarding (Bypassing)

Forwarding passes a result directly from where it's produced to where it's needed, bypassing the register file. Two main forwarding paths:

-EX/MEM → EX: result from ALU output forwarded to ALU input of next instruction
-MEM/WB → EX: result from memory/ALU forwarded to ALU input two instructions later

Forwarding eliminates stalls for most ALU-to-ALU RAW hazards.

-
Forwarding from EX/MEM pipeline register: value is available at the end of the EX stage
-
Forwarding from MEM/WB pipeline register: value is available at the end of the MEM stage
-
Forwarding hardware adds multiplexers before ALU inputs
-
Without forwarding, every RAW hazard between adjacent instructions causes 2 stall cycles

Load-Use Hazard

A load-use hazard occurs when a lw instruction is immediately followed by an instruction that uses the loaded value. Even with full forwarding, the data is not available until the end of the MEM stage — but it's needed at the start of the EX stage of the next instruction. This requires exactly one stall cycle (bubble).

-
lw produces its result at the end of MEM, but the next instruction needs it at the start of EX
-
One mandatory stall cycle even with full forwarding
-
If the using instruction is 2 or more slots away, no stall is needed
-
Pipeline scheduling can insert independent instructions into the bubble slot

Pipeline Scheduling

Pipeline scheduling (or instruction scheduling) reorders independent instructions to fill stall slots. The compiler or hardware rearranges instructions without changing program semantics to avoid hazards.

-
Move independent instructions into load-use bubble slots
-
Must preserve data dependencies (only reorder independent instructions)
-
Compiler scheduling is done at compile time — simpler hardware
-
Hardware scheduling (dynamic) is more flexible but complex

W3: Pipeline Basics W5: Branch Prediction