Lecture 12: Pipelined Processor • Last time – Simple processor organization – Logic & control – Motivation & idea behind pipeling • Today – Take QUIZ 8 over P&H 4.7-10, before 11:59pm today – Homework 4 due Thursday March 4, 2010 – Pipelining in the real world UTCS 352, Lecture 12 1 Pipelining Lessons 6 PM T a s k 8 9 Time 30 30 30 30 30 30 30 A B O r d e r 7 C D UTCS 352, Lecture 12 • Pipelining doesn’t help latency of single task, it helps throughput of entire workload • Multiple tasks operating simultaneously using different resources • Potential speedup = Number pipe stages • Pipeline rate limited by slowest pipeline stage • Unbalanced lengths of pipe stages reduces speedup • Time to “fill” pipeline and time to “drain” it reduces speedup • Stall for Dependences 2 1 Graphically Representing MIPS Pipeline ALU IM Reg DM Reg • Can help with answering questions like: – How many cycles does it take to execute this code? – What is the ALU doing during cycle 4? – Is there a hazard, why does it occur, and how can it be fixed? UTCS 352, Lecture 12 3 Why Pipeline? For Performance! Time (clock cycles) Inst 3 IM Reg DM IM Reg DM IM Reg DM IM Reg ALU Inst 2 DM ALU Inst 1 Once the pipeline is full, one instruction is completed every cycle, so CPI = 1 Reg ALU IM ALU O r d e r Inst 0 ALU I n s t r. Inst 4 Reg Reg Reg Reg DM Reg Time to fill the pipeline UTCS 352, Lecture 12 4 2 Can Pipelining Get Us Into Trouble? • Yes: Pipeline Hazards – structural hazards: attempt to use the same resource by two different instructions at the same time – data hazards: attempt to use data before it is ready • An instruction’s source operand(s) are produced by a prior instruction still in the pipeline – control hazards: attempt to make a decision about program control flow before the condition has been evaluated and the new PC target address calculated • branch instructions • Can always resolve hazards by waiting – pipeline control must detect the hazard – and take action to resolve hazards UTCS 352, Lecture 12 5 A Single Memory Would Be a Structural Hazard Time (clock cycles) Inst 4 Reg Reg Mem Reg Reg Mem Reg Reg Mem Reg Reg ALU Inst 3 Mem Reading data from memory Mem ALU Inst 2 Reg ALU Inst 1 Mem ALU O r d e r lw ALU I n s t r. Mem Mem Mem Reading instruction from memory Mem • Fix with separate instr and data memories (I$ and D$) Reg 6 3 How About Register File Access? Time (clock cycles) DM IM Reg DM IM Reg DM IM Reg ALU Inst 1 Reg ALU IM ALU O r d e r add $1, ALU I n s t r. Inst 2 add $2,$1, Reg Reg Reg DM Reg UTCS 352, Lecture 12 7 How About Register File Access? IM Reg DM IM Reg DM IM Reg ALU Inst 2 DM ALU Inst 1 Reg ALU O r d e r IM ALU I n s t r. add $1, add $2,$1, clock edge that controls register writing UTCS 352, Lecture 12 Reg Reg Fix register file access hazard by doing reads in the second half of the cycle and writes in the first half Reg DM Reg clock edge that controls loading of pipeline state registers 8 4 Register Usage Can Cause Data Hazards Dependencies backward in time cause hazards DM IM Reg DM IM Reg DM IM Reg ALU DM IM Reg ALU sub $4,$1,$5 Reg ALU IM ALU O r d e r add $1, ALU I n s t r. and $6,$1,$7 or $8,$1,$9 xor $4,$1,$5 Reg Reg Reg Reg DM Reg Read before write is ready: data hazard UTCS 352, Lecture 12 9 Register Usage Can Cause Data Hazards Dependencies backward in time cause hazards Reg DM IM Reg ALU DM IM Reg ALU DM IM Reg ALU DM IM Reg ALU sub $4,$1,$5 IM ALU add $1, and $6,$1,$7 or $8,$1,$9 xor $4,$1,$5 Reg Reg Reg Reg DM Reg Write-read data hazard UTCS 352, Lecture 12 10 5 Loads Can Cause Data Hazards Dependencies backward in time cause hazards Reg DM IM Reg DM IM Reg DM IM Reg ALU DM IM Reg ALU sub $4,$1,$5 IM ALU $1,4($2) ALU O r d e r lw ALU I n s t r. and $6,$1,$7 or $8,$1,$9 xor $4,$1,$5 Reg Reg Reg Reg DM Reg • Load-use data hazard UTCS 352, Lecture 12 11 12 Resolving Hazards: Pipeline Stalls • Can resolve any type of hazard – data, control, or structural • Detect the hazard • Freeze the pipeline up to the dependent stage until the hazard is resolved UTCS 352, Lecture 12 6 One Way to “Fix” a Data Hazard IM DM IM and $6,$1,$7 Reg Reg DM IM Reg ALU sub $4,$1,$5 Reg ALU O r d e r add $1, ALU I n s t r. Can fix data hazard by waiting – stall – but impacts CPI Reg DM UTCS 352, Lecture 12 Reg 13 Another Way to “Fix” a Data Hazard or $8,$1,$9 xor $4,$1,$5 UTCS 352, Lecture 12 IM Reg DM IM Reg DM IM Reg DM IM Reg ALU and $6,$1,$7 DM ALU sub $4,$1,$5 Reg ALU IM ALU O r d e r add $1, ALU I n s t r. Fix data hazards by forwarding results as soon as they are available to where they are needed Reg Reg Reg Reg DM Reg 14 7 Data Forwarding (aka Bypassing) • Take the result from the earliest point that it exists in any of the pipeline state registers and forward it to the functional units (e.g., the ALU) that need it that cycle • For ALU functional unit: the inputs can come from any pipeline register – adding multiplexors to the inputs of the ALU – connecting the Rd write data in EX/MEM or MEM/WB to either (or both) of the EX’s stage Rs and Rt ALU mux inputs – adding the proper control hardware to control the new muxes • Other functional units may need forwarding logic (e.g., the DM) • Forwarding can achieve a CPI of 1 even in the presence of data dependencies UTCS 352, Lecture 12 15 Datapath with Forwarding Hardware PCSrc ID/EX EX/MEM Control IF/ID Add Shift left 2 4 PC Instruction Memory Read Address Add Read Addr 1 Data Memory Register Read Read Addr 2Data 1 File Write Addr Write Data 16 Sign Extend MEM/WB Branch ALU Read Data 2 Address Read Data Write Data ALU cntrl 32 EX/MEM.RegisterRd ID/EX.RegisterRt ID/EX.RegisterRs UTCS 352, Lecture 12 Forward Unit MEM/WB.RegisterRd 16 8 Forwarding Illustration sub $4,$1,$5 Reg DM IM Reg DM IM Reg ALU IM ALU O r d e r add $1, ALU I n s t r. and $6,$7,$1 EX/MEM hazard forwarding Reg Reg DM Reg MEM/WB hazard forwarding UTCS 352, Lecture 12 17 Data Forwarding Control Conditions EX/MEM hazard: if (EX/MEM.RegWrite and (EX/MEM.RegisterRd and (EX/MEM.RegisterRd ForwardA = 10 if (EX/MEM.RegWrite and (EX/MEM.RegisterRd and (EX/MEM.RegisterRd ForwardB = 10 != 0) = ID/EX.RegisterRs)) != 0) = ID/EX.RegisterRt)) Forwards the result from the previous instr. to either input of the ALU MEM/WB hazard: if (MEM/WB.RegWrite and (MEM/WB.RegisterRd and (MEM/WB.RegisterRd ForwardA = 01 if (MEM/WB.RegWrite and (MEM/WB.RegisterRd and (MEM/WB.RegisterRd ForwardB = 01 UTCS 352, Lecture 12 Forwards the != 0) = ID/EX.RegisterRs)) result from the second previous instr. to either input != 0) = ID/EX.RegisterRt)) of the ALU 18 9 Yet Another Complication! • Another potential data hazard can occur when there is a conflict between the result of the WB stage instruction and the MEM stage instruction – which should be forwarded? add $1,$1,$4 Reg DM IM Reg DM IM Reg ALU add $1,$1,$3 IM ALU O r d e r add $1,$1,$2 ALU I n s t r. Reg UTCS 352, Lecture 12 Reg DM Reg 19 Corrected Data Forwarding Control Conditions MEM/WB hazard: if (MEM/WB.RegWrite and (MEM/WB.RegisterRd != 0) and (EX/MEM.RegisterRd != ID/EX.RegisterRs) and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 if (MEM/WB.RegWrite and (MEM/WB.RegisterRd != 0) and (EX/MEM.RegisterRd != ID/EX.RegisterRt) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01 UTCS 352, Lecture 12 20 10 Memory Data Hazards DM IM Reg DM IM Reg DM IM Reg DM IM Reg ALU sub $4,$1,$5 Reg ALU IM ALU $1,4($2) ALU O r d e r lw ALU I n s t r. and $6,$1,$7 or $8,$1,$9 xor $2,$7,$9 Reg Reg Reg Reg DM Reg • Does forwarding solve all our problems? UTCS 352, Lecture 12 21 Forwarding with Load-use Data Hazards or $8,$1,$9 xor $2,$7,$9 IM Reg DM IM Reg DM IM Reg DM IM Reg ALU and $6,$1,$7 DM ALU sub $4,$1,$5 Reg ALU $1,4($2) IM ALU O r d e r lw ALU I n s t r. Reg Reg Reg Reg DM Reg • still must stall one cycle even with forwarding! UTCS 352, Lecture 12 22 11 Forwarding with Load-use Data Hazards or xor $8,$1,$9 $4,$1,$5 xor $4,$1,$5 Reg DM IM Reg DM IM Reg DM IM Reg DM IM Reg ALU or $6,$1,$7 $8,$1,$9 and IM ALU and $4,$1,$5 $6,$1,$7 sub DM ALU stall $4,$1,$5 sub Reg ALU IM $1,4($2) ALU O r d e r lw ALU I n s t r. Reg Reg Reg UTCS 352, Lecture 12 Reg Reg DM 23 Load-use Hazard Detection Unit • Need a Hazard detection Unit in the ID stage that inserts a stall between the load and its use ID Hazard Detection if (ID/EX.MemRead and ((ID/EX.RegisterRt = IF/ID.RegisterRs) or (ID/EX.RegisterRt = IF/ID.RegisterRt))) stall the pipeline • The first line tests to see if the instruction now in the EX stage is a lw; the next two lines check to see if the destination register of the lw matches either source register of the instruction in the ID stage (the load-use instruction) • After this one cycle stall, the forwarding logic can handle the remaining data hazards UTCS 352, Lecture 12 24 12 Stall Hardware • We have to detect this case & implement the stall • Prevent the instructions in the IF and ID stages from progressing down the pipeline – done by preventing the PC register and the IF/ID pipeline register from changing – Hazard detection Unit controls the writing of the PC (PC.write) and IF/ID (IF/ID.write) registers • Insert a “bubble” between the lw instruction (in the EX stage) and the load-use instruction (in the ID stage) (i.e., insert a noop in the execution stream) – Set the control bits in the EX, MEM, and WB control fields of the ID/EX pipeline register to 0 (noop). The Hazard Unit controls the mux that chooses between the real control values and the 0’s. • Let the lw instruction and the instructions after it in the pipeline (before it in the code) proceed normally down the pipeline UTCS 352, Lecture 12 Adding the Hazard Hardware Hazard Unit PC ID/EX.MemRead EX/MEM 0 Control 0 Shift left 2 4 Instruction Memory Read Address Add Read Addr 1 File Write Addr 16 Sign Extend Read Data 2 32 ID/EX.RegisterRt MEM/WB Branch Data Memory Register Read Read Addr 2Data 1 Write Data UTCS 352, Lecture 12 PCSrc 1 IF/ID Add ID/EX 25 ALU Address Read Data Write Data ALU cntrl Forward Unit 26 13 Compiler Instruction Scheduling • The compiler can rearrange instructions, eliminating load-use hazard! • Proebsting & Fischer (1991) show how to optimally schedule a straight line sequence of instructions, given sufficient registers and a delay of one pipeline stage. • Approach • Build a dependence graph that describes the partial order of instruction definitions and uses • Schedule R independent loads (load; load; load; ..) • Each load requires a register, • thus R is the minimum number of live registers • Schedule operation independent of the previous load and another UTCS 352, Lecture 12 load in a pair (operation; load) 27 Compiler Scheduling to Avoid Load-use Data Hazards or $8,$1,$9 xor $2,$7,$9 UTCS 352, Lecture 12 IM Reg DM IM Reg DM IM Reg DM IM Reg ALU and $6,$1,$7 DM ALU sub $4,$1,$5 Reg ALU $1,4($2) IM ALU O r d e r lw ALU I n s t r. Reg Reg Reg Reg DM Reg 28 14 Compiler Scheduling to Avoid Load-use Data Hazards IM Reg DM IM Reg DM IM Reg DM IM Reg ALU sub $4,$1,$5 DM ALU xor $2,$7,$9 Reg ALU $1,4($2) IM ALU O r d e r lw ALU I n s t r. and $6,$1,$7 or $8,$1,$9 Reg Reg Reg UTCS 352, Lecture 12 Reg DM Reg 29 30 Types of Data Hazards • RAW (read after write) – only hazard for ‘fixed’ pipelines – later instruction must read after earlier write • WAW (write after write) – variable-length pipeline – later instruction must write after earlier write • WAR (write after read) – pipelines with late read – later instruction must write after earlier read UTCS 352, Lecture 12 15 Memory-to-Memory Copies • For loads immediately followed by stores (memory -to-memory copies) can avoid a stall by adding forwarding hardware from the MEM/WB register to the data memory input. – Would need to add a Forward Unit and a mux to the memory access stage sw $1,4($3) IM Reg DM IM Reg ALU O r d e r lw $1,4($2) ALU I n s t r. Reg DM Reg UTCS 352, Lecture 12 31 32 Summary • The real world of pipelining – Just stall – Forwarding for register and memory hazards • Next Time – Prediction for control hazards – Multiple issue and out-of-order processors – Homework 4 due Thursday March 4, 2010 • Reading: P&H 4.11-15 UTCS 352, Lecture 12 16