Advanced Pipelining Out of Order Processors COMP25212 From Monday… Out-of-Order Execution with Scoreboard • Centralized data structure which tracks the status of registers, FUs and instructions and creates, dynamically in hardware, the dependency graph – The centralized nature limits scalability: – Small number of FUs and small window of instructions • Dependencies – RAW – stall conflicted instruction – WAW – stall the pipeline – WAR – stall WB Out of Order Execution with Tomasulo Tomasulo’s Algorithm • Control logic for out-of-order execution is decentralized – Reservation Stations (RS) in the functional units keep instruction information – In addition RS seamlessly rename registers • A Common Data Bus (CDB) broadcasts data and results to the different devices – A single instruction can finish each cycle • Distributed control allows for a larger window of instructions – Dynamic scheduling Tomasulo’s Algorithm • Structural hazards stall the pipeline • RS tracks when operands are available and buffers them as soon as they are – No need for accessing register bank (store values or sources) • Impact of RAW dependencies are limited – Execute an instruction when its operands are available • WAW and WAR dependencies are avoided – Register renaming Register Renaming (Example) • Eliminates WAR and WAW hazards by renaming all destination registers. • Can be done by compiler True dependences DIV.D ADD.D ST.D SUB.D MUL.D Output dependence F0, F2, F4 S F0, F8 Antidependence F6, S 0(R1) F6, T F10, F14 F8, T F6, F10, F8 Tomasulo Organization From Mem FP Op Queue FP Registers Load Buffers Load1 Load2 Load3 Load4 Load5 Load6 Store Buffers Add1 Add2 Add3 Mult1 Mult2 FP adders Reservation Stations To Mem FP multipliers Common Data Bus (CDB) Normal data bus: data + destination Common data bus: data + source Stages of a Tomasulo Pipeline Execute Integer Issue Write Back Execute FP Multiplication Write Back Execute FP Multiplication Write Back Execute FP Add Write Back Execute FP Division Write Back Three Stages of Tomasulo Algorithm 1. Issue—get instruction from FP Op Queue If reservation station free (no structural hazard), control issues instr & sends operands (renames registers). 2. Execute—operate on operands (EX) When both source operands are ready then execute; if not ready, watch Common Data Bus for result 3. Write result—finish execution (WB) Write on Common Data Bus to all awaiting units; mark reservation station available • Normal data bus: data + destination (“go to” bus) • Common data bus: data + source (“come from” bus) – 64 bits of data + 4 bits of Functional Unit source address – Write if matches expected Functional Unit (produces result) – Does the broadcast Reservation Station Components No information about instructions needed Tomasulo Example Instruction stream Instruction status: Instruction LD F6 LD F2 MULTD F0 SUBD F8 DIVD F10 ADDD F6 j 34+ 45+ F2 F6 F0 F8 k R2 R3 F4 F2 F6 F2 Exec Write Issue Comp Result Load1 Load2 Load3 Reservation Stations: Time Name Busy Add1 No Add2 No Add3 No Mult1 No Mult2 No Register result status: Clock 0 FU Busy Addressstatus: Instruction Op S1 Vj S2 Vk RS Qj RS Qk F0 F2 F4 F6 F8 No Tomasulo does not No need this info No We will show the times for each stage, for convenience F10 F12 ... F30 Reservation Station Components No information about instructions needed Op: Operation to perform in the unit (e.g., + or –) Vj, Vk: Value of Source operands – Store buffers has V field, result to be stored Qj, Qk: Reservation stations producing source registers (value to be written) – Note: Qj,Qk=0 => ready – Store buffers only have Qi for RS producing result Busy: Indicates reservation station or FU is busy Reservation Stations: 3 Load Buffers Tomasulo Example Instruction status: Instruction LD F6 LD F2 MULTD F0 SUBD F8 DIVD F10 ADDD F6 j 34+ 45+ F2 F6 F0 F8 k R2 R3 F4 F2 F6 F2 Exec Write Issue Comp Result Load1 Load2 Load3 Reservation Stations: FU count down Time Name Busy Add1 No Add2 No Add3 No Mult1 No Mult2 No Register result status: Clock 0 FU Busy Address Op S1 Vj S2 Vk F0 Source registers F2 F4 RS Qj RS Qk No No No Source Stations: Reservation registers 3 Adder 2 Multiplication Which FU will F6 produce F8 F10 operands F12 ... F30 Reservation Station Components No information about instructions needed Op: Operation to perform in the unit (e.g., + or –) Vj, Vk: Value of Source operands – Store buffers has V field, result to be stored Qj, Qk: Reservation stations producing source registers (value to be written) – Note: Qj,Qk=0 => ready – Store buffers only have Qi for RS producing result Busy: Indicates reservation station or FU is busy Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions that will write that register. Tomasulo Example Instruction status: Instruction LD F6 LD F2 MULTD F0 SUBD F8 DIVD F10 ADDD F6 j 34+ 45+ F2 F6 F0 F8 k R2 R3 F4 F2 F6 F2 Exec Write Issue Comp Result Load1 Load2 Load3 Reservation Stations: Time Name Busy Add1 No Add2 No Add3 No Mult1 No Mult2 No Register result status: Clock 0 Clock cycle counter Busy Address Op S1 Vj S2 Vk RS Qj RS Qk F0 F2 F4 F6 F8 No No No F10 F12 FU Which RS will write in each register? ... F30 A Tomasulo Example The following code is run on a Tomasulo pipeline with: Functional Unit (FU) FP Multiply/Division FP Addition/Substraction Mem Load L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2 # of FUs 2 3 3 EX cycles 10/40 2 2 Functional units not pipelined Dependency Graph For Example Code Example Code 1 1 2 3 4 5 6 L.D F6, 34 (R2) 2 L.D F2, 45 (R3) 3 MUL.D F0, F2, F4 4 SUB.D F8, F6, F2 5 DIV.D F10, F0, F6 L.D L.D MUL.D SUB.D DIV.D ADD.D F6, 34(R2) F2, 45(R3) F0, F2, F4 F8, F6, F2 F10, F0, F6 F6, F8, F2 Date Dependence: (1, 4) (1, 5) (2, 3) (2, 4) (2, 6) (3, 5) (4, 6) Output Dependence: (1, 6) Anti-dependence: (5, 6) Real Data Dependence (RAW) 6 ADD.D F6, F8, F2 Anti-dependence (WAR) Output Dependence (WAW) Tomasulo Example Instruction status: Instruction LD F6 LD F2 MULTD F0 SUBD F8 DIVD F10 ADDD F6 j 34+ 45+ F2 F6 F0 F8 k R2 R3 F4 F2 F6 F2 Exec Write Issue Comp Result Load1 Load2 Load3 Reservation Stations: Time Name Busy Add1 No Add2 No Add3 No Mult1 No Mult2 No Register result status: Clock 0 FU Busy Address Op S1 Vj S2 Vk RS Qj RS Qk F0 F2 F4 F6 F8 No No No F10 F12 ... F30 Tomasulo Example Cycle 1 Instruction status: Instruction LD F6 LD F2 MULTD F0 SUBD F8 DIVD F10 ADDD F6 j 34+ 45+ F2 F6 F0 F8 k R2 R3 F4 F2 F6 F2 Exec Write Issue Comp Result 1 Reservation Stations: Time Name Busy Add1 No Add2 No Add3 No Mult1 No Mult2 No Register result status: Clock 1 LD#1 issued FU Busy Address Load1 Load2 Load3 Op S1 Vj S2 Vk RS Qj RS Qk F0 F2 F4 F6 F8 Load1 Yes No No 34+R2 F10 F12 ... F30 Tomasulo Example Cycle 2 Instruction status: Instruction LD F6 LD F2 MULTD F0 SUBD F8 DIVD F10 ADDD F6 j 34+ 45+ F2 F6 F0 F8 k R2 R3 F4 F2 F6 F2 Exec Write Issue Comp Result 1 2 Reservation Stations: Time Name Busy Add1 No Add2 No Add3 No Mult1 No Mult2 No Register result status: Clock 2 LD#2 issued FU Busy Address Load1 Load2 Load3 Op S1 Vj S2 Vk RS Qj RS Qk F0 F2 F4 F6 F8 Load2 Load1 Yes Yes No 34+R2 45+R3 F10 F12 ... F30 Tomasulo Example Cycle 3 Instruction status: Instruction LD F6 LD F2 MULTD F0 SUBD F8 DIVD F10 ADDD F6 j 34+ 45+ F2 F6 F0 F8 k R2 R3 F4 F2 F6 F2 Exec Write Issue Comp Result 1 2 3 Reservation Stations: Time Name Busy Op Add1 No Add2 No Add3 No Mult1 Yes MULTD Mult2 No Register result status: Clock 3 FU F0 Busy Address 3 S1 Vj Load1 Load2 Load3 S2 Vk RS Qj Yes Yes No 34+R2 45+R3 F10 F12 RS Qk R(F4) Load2 F2 Mult1 Load2 F4 F6 F8 Load1 • MULTD is issued • LD#1 completes and broadcasts its result ... F30 Tomasulo Example Cycle 4 Instruction status: Instruction LD F6 LD F2 MULTD F0 SUBD F8 DIVD F10 ADDD F6 j 34+ 45+ F2 F6 F0 F8 k R2 R3 F4 F2 F6 F2 Exec Write Issue Comp Result 1 2 3 4 Reservation Stations: Busy Address 3 4 4 Load1 Load2 Load3 S1 Vj S2 Vk RS Qj RS Qk F2 F4 F6 F8 No Yes No 45+R3 F10 F12 Time Name Busy Op Add1 Yes SUBD M(A1) Load2 Add2 No Add3 No Mult1 Yes MULTD R(F4) Load2 Mult2 No Register result status: Clock 4 FU F0 Mult1 Load2 • SUBD is issued • LD#1 result updates the register bank • LD#2 completes, broadcasting its result M(A1) Add1 ... F30 Tomasulo Example Cycle 5 Instruction status: Instruction LD F6 LD F2 MULTD F0 SUBD F8 DIVD F10 ADDD F6 j 34+ 45+ F2 F6 F0 F8 k R2 R3 F4 F2 F6 F2 Exec Write Issue Comp Result 1 2 3 4 5 Reservation Stations: Busy Address 3 4 4 5 Load1 Load2 Load3 S1 Vj S2 Vk RS Qj RS Qk F2 F4 F6 F8 Time Name Busy Op 2 Add1 Yes SUBD M(A1) M(A2) Add2 No Add3 No 10 Mult1 Yes MULTD M(A2) R(F4) Mult2 Yes DIVD M(A1) Mult1 Register result status: Clock 5 FU F0 Mult1 M(A2) • DIVD is issued • LD#2 result updates the register bank • Add1, Mult1 start execution No No No F10 M(A1) Add1 Mult2 F12 ... F30 Tomasulo Example Cycle 6 Instruction status: Instruction LD F6 LD F2 MULTD F0 SUBD F8 DIVD F10 ADDD F6 j 34+ 45+ F2 F6 F0 F8 k R2 R3 F4 F2 F6 F2 Exec Write Issue Comp Result 1 2 3 4 5 6 Reservation Stations: Busy Address 3 4 4 5 Load1 Load2 Load3 S1 Vj S2 Vk RS Qj RS Qk F2 F4 F6 F8 Time Name Busy Op 1 Add1 Yes SUBD M(A1) M(A2) Add2 Yes ADDD M(A2) Add1 Add3 No 9 Mult1 Yes MULTD M(A2) R(F4) Mult2 Yes DIVD M(A1) Mult1 Register result status: Clock 6 • ADDD issued FU F0 Mult1 M(A2) Add2 No No No F10 Add1 Mult2 F12 ... F30 Tomasulo Example Cycle 7 Instruction status: Instruction LD F6 LD F2 MULTD F0 SUBD F8 DIVD F10 ADDD F6 j 34+ 45+ F2 F6 F0 F8 k R2 R3 F4 F2 F6 F2 Exec Write Issue Comp Result 1 2 3 4 5 6 Reservation Stations: 3 4 Busy Address 4 5 Load1 Load2 Load3 7 S1 Vj S2 Vk RS Qj RS Qk F2 F4 F6 F8 Time Name Busy Op 0 Add1 Yes SUBD M(A1) M(A2) Add2 Yes ADDD M(A2) Add1 Add3 No 8 Mult1 Yes MULTD M(A2) R(F4) Mult2 Yes DIVD M(A1) Mult1 Register result status: Clock 7 FU F0 No No No Mult1 M(A2) Add2 F10 Add1 Mult2 • Add1 (SUBD) completes and broadcasts result F12 ... F30 Tomasulo Example Cycle 8 Instruction status: Instruction LD F6 LD F2 MULTD F0 SUBD F8 DIVD F10 ADDD F6 j 34+ 45+ F2 F6 F0 F8 k R2 R3 F4 F2 F6 F2 Exec Write Issue Comp Result 1 2 3 4 5 6 Reservation Stations: Busy Address 3 4 4 5 Load1 Load2 Load3 7 8 S1 Vj S2 Vk RS Qj RS Qk F2 F4 F6 F8 Time Name Busy Op Add1 No 2 Add2 Yes ADDD (M-M) M(A2) Add3 No 7 Mult1 Yes MULTD M(A2) R(F4) Mult2 Yes DIVD M(A1) Mult1 Register result status: Clock 8 FU F0 Mult1 M(A2) No No No F10 Add2 (M-M) Mult2 • Add1 (SUBD) result updates the register bank • Add2 (ADDD) start execution F12 ... F30 Tomasulo Example Cycle 9 Instruction status: Instruction LD F6 LD F2 MULTD F0 SUBD F8 DIVD F10 ADDD F6 j 34+ 45+ F2 F6 F0 F8 k R2 R3 F4 F2 F6 F2 Exec Write Issue Comp Result 1 2 3 4 5 6 Reservation Stations: Busy Address 3 4 4 5 Load1 Load2 Load3 7 8 S1 Vj S2 Vk RS Qj RS Qk F2 F4 F6 F8 Time Name Busy Op Add1 No 1 Add2 Yes ADDD (M-M) M(A2) Add3 No 6 Mult1 Yes MULTD M(A2) R(F4) Mult2 Yes DIVD M(A1) Mult1 Register result status: Clock 9 FU F0 Mult1 M(A2) No No No F10 Add2 (M-M) Mult2 • ADDD and MULTD continue execution F12 ... F30 Tomasulo Example Cycle 10 Instruction status: Instruction LD F6 LD F2 MULTD F0 SUBD F8 DIVD F10 ADDD F6 j 34+ 45+ F2 F6 F0 F8 k R2 R3 F4 F2 F6 F2 Exec Write Issue Comp Result 1 2 3 4 5 6 Reservation Stations: 3 4 4 5 7 8 Busy Address Load1 Load2 Load3 10 S1 Vj S2 Vk RS Qj RS Qk F2 F4 F6 F8 Time Name Busy Op Add1 No 0 Add2 Yes ADDD (M-M) M(A2) Add3 No 5 Mult1 Yes MULTD M(A2) R(F4) Mult2 Yes DIVD M(A1) Mult1 Register result status: Clock 10 FU F0 Mult1 M(A2) • Add2 (ADDD) completes No No No F10 Add2 (M-M) Mult2 F12 ... F30 Tomasulo Example Cycle 11 Instruction status: Instruction LD F6 LD F2 MULTD F0 SUBD F8 DIVD F10 ADDD F6 j 34+ 45+ F2 F6 F0 F8 k R2 R3 F4 F2 F6 F2 Exec Write Issue Comp Result 1 2 3 4 5 6 Reservation Stations: Busy Address 3 4 4 5 Load1 Load2 Load3 7 8 10 11 S1 Vj S2 Vk RS Qj RS Qk F2 F4 F6 F8 Time Name Busy Op Add1 No Add2 No Add3 No 4 Mult1 Yes MULTD M(A2) R(F4) Mult2 Yes DIVD M(A1) Mult1 Register result status: Clock 11 FU F0 Mult1 M(A2) No No No F10 (M-M+M)(M-M) Mult2 • ADDD result updates the register bank F12 ... F30 Tomasulo Example Cycle 12 Instruction status: Instruction LD F6 LD F2 MULTD F0 SUBD F8 DIVD F10 ADDD F6 j 34+ 45+ F2 F6 F0 F8 k R2 R3 F4 F2 F6 F2 Exec Write Issue Comp Result 1 2 3 4 5 6 Reservation Stations: Busy Address 3 4 4 5 Load1 Load2 Load3 7 8 10 11 S1 Vj S2 Vk RS Qj RS Qk F2 F4 F6 F8 Time Name Busy Op Add1 No Add2 No Add3 No 3 Mult1 Yes MULTD M(A2) R(F4) Mult2 Yes DIVD M(A1) Mult1 Register result status: Clock 12 FU F0 Mult1 M(A2) • MULTD continues execution No No No F10 (M-M+M)(M-M) Mult2 F12 ... F30 Tomasulo Example Cycle 13 Instruction status: Instruction LD F6 LD F2 MULTD F0 SUBD F8 DIVD F10 ADDD F6 j 34+ 45+ F2 F6 F0 F8 k R2 R3 F4 F2 F6 F2 Exec Write Issue Comp Result 1 2 3 4 5 6 Reservation Stations: Busy Address 3 4 4 5 Load1 Load2 Load3 7 8 10 11 S1 Vj S2 Vk RS Qj RS Qk F2 F4 F6 F8 Time Name Busy Op Add1 No Add2 No Add3 No 2 Mult1 Yes MULTD M(A2) R(F4) Mult2 Yes DIVD M(A1) Mult1 Register result status: Clock 13 FU F0 Mult1 M(A2) • MULTD continues execution No No No F10 (M-M+M)(M-M) Mult2 F12 ... F30 Tomasulo Example Cycle 14 Instruction status: Instruction LD F6 LD F2 MULTD F0 SUBD F8 DIVD F10 ADDD F6 j 34+ 45+ F2 F6 F0 F8 k R2 R3 F4 F2 F6 F2 Exec Write Issue Comp Result 1 2 3 4 5 6 Reservation Stations: Busy Address 3 4 4 5 Load1 Load2 Load3 7 8 10 11 S1 Vj S2 Vk RS Qj RS Qk F2 F4 F6 F8 Time Name Busy Op Add1 No Add2 No Add3 No 1 Mult1 Yes MULTD M(A2) R(F4) Mult2 Yes DIVD M(A1) Mult1 Register result status: Clock 14 FU F0 Mult1 M(A2) • MULTD continues execution No No No F10 (M-M+M)(M-M) Mult2 F12 ... F30 Tomasulo Example Cycle 15 Instruction status: Instruction LD F6 LD F2 MULTD F0 SUBD F8 DIVD F10 ADDD F6 j 34+ 45+ F2 F6 F0 F8 k R2 R3 F4 F2 F6 F2 Exec Write Issue Comp Result 1 2 3 4 5 6 Reservation Stations: Busy Address 3 4 15 7 4 5 Load1 Load2 Load3 10 11 S1 Vj S2 Vk RS Qj RS Qk F2 F4 F6 F8 8 Time Name Busy Op Add1 No Add2 No Add3 No 0 Mult1 Yes MULTD M(A2) R(F4) Mult2 Yes DIVD M(A1) Mult1 Register result status: Clock 15 FU F0 Mult1 M(A2) No No No F10 (M-M+M)(M-M) Mult2 • MULTD completes and broadcasts result F12 ... F30 Tomasulo Example Cycle 16 Instruction status: Instruction LD F6 LD F2 MULTD F0 SUBD F8 DIVD F10 ADDD F6 j 34+ 45+ F2 F6 F0 F8 k R2 R3 F4 F2 F6 F2 Exec Write Issue Comp Result 1 2 3 4 5 6 Reservation Stations: 3 4 15 7 4 5 16 8 Load1 Load2 Load3 10 11 S1 Vj S2 Vk RS Qj RS Qk F2 F4 F6 F8 Time Name Busy Op Add1 No Add2 No Add3 No Mult1 No 40 Mult2 Yes DIVD M*F4 M(A1) Register result status: Clock 16 FU F0 Busy Address M*F4 M(A2) No No No F10 (M-M+M)(M-M) Mult2 • MULTD result updates the register bank • DIVD starts execution F12 ... F30 39 cycles later… Tomasulo Example Cycle 55 Instruction status: Instruction LD F6 LD F2 MULTD F0 SUBD F8 DIVD F10 ADDD F6 j 34+ 45+ F2 F6 F0 F8 k R2 R3 F4 F2 F6 F2 Exec Write Issue Comp Result 1 2 3 4 5 6 Reservation Stations: 3 4 15 7 4 5 16 8 Load1 Load2 Load3 10 11 S1 Vj S2 Vk RS Qj RS Qk F2 F4 F6 F8 Time Name Busy Op Add1 No Add2 No Add3 No Mult1 No 1 Mult2 Yes DIVD M*F4 M(A1) Register result status: Clock 55 FU F0 Busy Address M*F4 M(A2) • DIVD is about to complete No No No F10 (M-M+M)(M-M) Mult2 F12 ... F30 Tomasulo Example Cycle 56 Instruction status: Instruction LD F6 LD F2 MULTD F0 SUBD F8 DIVD F10 ADDD F6 j 34+ 45+ F2 F6 F0 F8 k R2 R3 F4 F2 F6 F2 Exec Write Issue Comp Result 1 2 3 4 5 6 Reservation Stations: 3 4 15 7 56 10 4 5 16 8 Load1 Load2 Load3 S1 Vj S2 Vk RS Qj RS Qk 56 FU • DIVD completes F0 F2 F4 F6 F8 M*F4 M(A2) No No No 11 Time Name Busy Op Add1 No Add2 No Add3 No Mult1 No 0 Mult2 Yes DIVD M*F4 M(A1) Register result status: Clock Busy Address F10 (M-M+M)(M-M) Mult2 F12 ... F30 Tomasulo Example Cycle 57 Instruction status: Instruction LD F6 LD F2 MULTD F0 SUBD F8 DIVD F10 ADDD F6 j 34+ 45+ F2 F6 F0 F8 k R2 R3 F4 F2 F6 F2 Exec Write Issue Comp Result 1 2 3 4 5 6 3 4 15 7 56 10 4 5 16 8 57 11 Op S1 Vj S2 Vk RS Qj RS Qk F0 F2 F4 F6 F8 Reservation Stations: Time Name Busy Add1 No Add2 No Add3 No Mult1 No Mult2 No Register result status: Clock 56 FU Busy Address M*F4 M(A2) Load1 Load2 Load3 No No No F10 (M-M+M)(M-M) Result • DIVD result updates the register bank F12 ... Tomasulo Example Cycle 57 Instruction status: Instruction LD F6 LD F2 MULTD F0 SUBD F8 DIVD F10 ADDD F6 j 34+ 45+ F2 F6 F0 F8 k R2 R3 F4 F2 F6 F2 Exec Write Issue Comp Result 1 2 3 4 5 6 3 4 15 7 56 10 4 5 16 8 57 11 Op S1 Vj S2 Vk RS Qj RS Qk F0 F2 F4 F6 F8 Reservation Stations: Time Name Busy Add1 No Add2 No Add3 No Mult1 No Mult2 No Register result status: Clock 56 FU Address In-orderBusy issue Load1 No Load2 No Out-of-order execution Load3 No Out-of-order completion M*F4 M(A2) F10 (M-M+M)(M-M) Result F12 ... Tomasulo’s advantages (1) Distributed hazard detection logic – distributed reservation stations and the CDB – If multiple instructions waiting on a single result, & each instruction has other operand, then instructions can be dispatched simultaneously by broadcasting on CDB – If a centralized register file were used, the units would have to read their results from the registers when register buses are available. (2) Avoids stalling due to WAW or WAR hazards Tomasulo Drawbacks • Complexity of hardware • Performance limited by Common Data Bus – Each CDB must go to all functional units high capacitance, high wiring density – Number of functional units that can complete per cycle limited to one! » Multiple CDBs more FU logic for parallel stores Summary • Reservations stations: implicit register renaming to larger set of registers + buffering source operands – Prevents registers from being bottleneck – Avoids the WAR and WAW hazards of Scoreboard • Lasting Contributions – Dynamic scheduling – Register renaming – Load/store disambiguation Summary of Out-of-Order Processors Out of Order Processors BENEFITS: LIMITATIONS: • Accelerates the execution of programs • More efficient design • More complex design • Very expensive in terms of area and power • Non-precise interrupts – Increases the utilisation of processor resources – Interrupting exactly after an instruction might not be possible Scoreboard vs Tomasulo Scoreboard Tomasulo ≤ 5 instructions ≤ 14 instructions Structural hazard: No issue No issue WAR dependency stall completion renaming avoids WAW dependency: stall completion renaming avoids Window size: Results forwarding: Write/read registers Control structure: central scoreboard Broadcast from FU distributed reservation stations Example In-order RAW LD R1 X LD R2 Y ADD R3 R1 R2 SUB R3 R5 R6 MUL R4 R1 R1 DIV R7 R5 R6 1 IF 2 ID IF 3 4 LD1 LD2 ID LD1 IF ID IF 5 LD3 LD2 Stall Stall 3 4 LD1 LD2 RO LD1 I RO I 5 LD3 LD2 RO I 6 LD4 LD3 Stall Stall 7 WB LD4 Stall Stall 8 WB Stall Stall 9 10 11 12 13 14 15 16 RAW – Stall the pipeline Add1 Add2 WB ID Sub1 Sub2 WB IF ID Mul1 Mul2 Mul3 Mul4 WB IF ID Div1 Div2 Div3 Div4 WB Out-of-order with Scoreboard LD R1 X LD R2 Y ADD R3 R1 R2 SUB R3 R5 R6 MUL R4 R1 R1 DIV R7 R5 R6 WAW 1 I 2 RO I 6 LD4 LD3 RO I 7 WB LD4 RO I 8 9 10 11 12 13 14 15 RAW – ADD stalled, SUB could be issued WB RO RO I Add1 Add2 WB Sub1 Sub2 WB RO Mul1 Mul2 Mul3 Mul4 I RO Div1 Div2 Div3 WB Div4 WB Out-of-order with Tomasulo LD R1 X LD R2 Y ADD R3 R1 R2 SUB R3 R5 R6 MUL R4 R1 R1 DIV R7 R5 R6 1 I 2 3 4 5 6 7 8 9 10 LD1 LD2 LD3 LD4 CDB I LD1 LD2 LD3 LD4 CDB I RS RS RS RS Add1 Add2 CDB I Sub1 Sub2 CDB CDB I RS Mul1 Mul2 Mul3 Mul4 I Div1 Div2 Div3 Div4 LD – 4 cycles Add/Sub – 2 cycles Mul/Div – 2 cycles 11 12 RAW – ADD stalled, SUB can be issued CDB CDB CDB Assuming no structural Hazards Example In-order LD R1 X LD R2 Y ADD R3 R1 R2 SUB R3 R5 R6 MUL R4 R1 R1 DIV R7 R5 R6 1 IF 2 ID IF 3 4 LD1 LD2 ID LD1 IF ID IF 5 LD3 LD2 Stall Stall 3 4 LD1 LD2 RO LD1 I RO I 5 LD3 LD2 RO I 6 LD4 LD3 Stall Stall 7 WB LD4 Stall Stall 8 WB Stall Stall 9 10 11 12 13 14 15 16 Add1 Add2 WB ID Sub1 Sub2 WB IF ID Mul1 Mul2 Mul3 Mul4 WB IF ID Div1 Div2 Div3 Div4 WB Out-of-order with Scoreboard LD R1 X LD R2 Y ADD R3 R1 R2 SUB R3 R5 R6 MUL R4 R1 R1 DIV R7 R5 R6 WAW 1 I 2 RO I 6 LD4 LD3 RO I 7 WB LD4 RO I 8 9 10 11 12 13 14 15 WAW – SUB cannot be issued Add1 Add2 WB Stall Sub1 Sub2 WBthe pipeline WB RO RO I RO I Mul1 Mul2 Mul3 Mul4 RO Div1 Div2 Div3 WB Div4 WB Out-of-order with Tomasulo LD R1 X LD R2 Y ADD R3 R1 R2 SUB R3 R5 R6 MUL R4 R1 R1 DIV R7 R5 R6 1 I 2 3 4 5 6 7 8 9 10 LD1 LD2 LD3 LD4 CDB I LD1 LD2 LD3 LD4 CDB I RS RS RS RS Add1 Add2 CDB I Sub1 Sub2 CDB CDB I RS Mul1 Mul2 Mul3 Mul4 I Div1 Div2 Div3 Div4 LD – 4 cycles Add/Sub – 2 cycles Mul/Div – 2 cycles 11 12 WAW – Allowed by register renaming in RS CDB CDB CDB Assuming no structural Hazards Example In-order LD R1 X LD R2 Y ADD R3 R1 R2 SUB R3 R5 R6 MUL R4 R1 R1 DIV R7 R5 R6 1 IF 2 ID IF 3 4 LD1 LD2 ID LD1 IF ID IF 5 LD3 LD2 Stall Stall 3 4 LD1 LD2 RO LD1 I RO I 5 LD3 LD2 RO I 6 LD4 LD3 Stall Stall 7 WB LD4 Stall Stall 8 WB Stall Stall 9 10 11 12 13 14 15 16 Add1 Add2 WB ID Sub1 Sub2 WB IF ID Mul1 Mul2 Mul3 Mul4 WB IF ID Div1 Div2 Div3 Div4 WB Out-of-order with Scoreboard LD R1 X LD R2 Y ADD R3 R1 R2 SUB R3 R5 R6 MUL R4 R1 R1 DIV R7 R5 R6 1 I 2 RO I 6 LD4 LD3 RO I 7 WB LD4 RO I 8 WB RO RO I 9 10 11 12 13 14 15 Add1 Add2 WB Sub1 Sub2 WB RO Mul1 Mul2 Mul3 Mul4 I RO Div1 Div2 Div3 2 instrs.WB can finish at Div4 WB the same time Out-of-order with Tomasulo LD R1 X LD R2 Y ADD R3 R1 R2 SUB R3 R5 R6 MUL R4 R1 R1 DIV R7 R5 R6 1 I 2 3 4 5 6 7 8 9 10 LD1 LD2 LD3 LD4 CDB I LD1 LD2 LD3 LD4 CDB I RS RS RS RS Add1 Add2 CDB I Sub1 Sub2 CDB CDB I RS Mul1 Mul2 Mul3 Mul4 I Div1 Div2 Div3 Div4 LD – 4 cycles Add/Sub – 2 cycles Mul/Div – 2 cycles 11 CDB CDB 12 CDB CDB limits finishing instrs. to one/cycle Assuming no structural Hazards