inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #19: Pipelining II 2005-07-21 Andy Carle CS 61C L19 Pipelining II (1) A Carle, Summer 2005 © UCB +4 1. Instruction Fetch ALU Data memory rd rs rt registers PC instruction memory Review: Datapath for MIPS imm 5. Write 2. Decode/ 3. Execute 4. Memory Back Register Read • Use datapath figure to represent pipeline IFtch Dcd Exec Mem WB CS 61C L19 Pipelining II (2) Reg ALU I$ D$ Reg A Carle, Summer 2005 © UCB Review: Problems for Computers • Limits to pipelining: Hazards prevent next instruction from executing during its designated clock cycle • Structural hazards: HW cannot support this combination of instructions (single person to fold and put clothes away) • Control hazards: Pipelining of branches & other instructions stall the pipeline until the hazard; “bubbles” in the pipeline • Data hazards: Instruction depends on result of prior instruction still in the pipeline (missing sock) CS 61C L19 Pipelining II (3) A Carle, Summer 2005 © UCB Review: C.f. Branch Delay vs. Load Delay • Load Delay occurs only if necessary (dependent instructions). • Branch Delay always happens (part of the ISA). • Why not have Branch Delay interlocked? • Answer: Interlocks only work if you can detect hazard ahead of time. By the time we detect a branch, we already need its value … hence no interlock is possible! CS 61C L19 Pipelining II (4) A Carle, Summer 2005 © UCB FYI: Historical Trivia • First MIPS design did not interlock and stall on load-use data hazard • Real reason for name behind MIPS: Microprocessor without Interlocked Pipeline Stages • Word Play on acronym for Millions of Instructions Per Second, also called MIPS • Load/Use Wrong Answer! CS 61C L19 Pipelining II (5) A Carle, Summer 2005 © UCB Outline • Pipeline Control • Forwarding Control • Hazard Control CS 61C L19 Pipelining II (6) A Carle, Summer 2005 © UCB Piped Proc So Far … CS 61C L19 Pipelining II (7) A Carle, Summer 2005 © UCB New Representation: Regs more explicit S S B D Reg. File ME/WB Data Mem A EX/ME Exec DE/EX Reg File IR Inst. Mem PC Next PC IF/DE M IF/DE.Ir = Instruction DE/EX.A = BusA out of Reg EX/ME.S = AluOut EX/ME.D = Bus B pass-through for sw ME/WB.S = ALuOut pass-through ME/WB.M = Mem Result from lw CS 61C L19 Pipelining II (8) A Carle, Summer 2005 © UCB New Representation: Regs more explicit S S B D Reg. File ME/WB Data Mem A EX/ME Exec DE/EX Reg File IR Inst. Mem PC Next PC IF/DE M What’s Missing??? CS 61C L19 Pipelining II (9) A Carle, Summer 2005 © UCB Pipelined Processor (almost) for slides Idea: Parallel Piped Control … M Data Mem D Mem Access S B Equal WB Ctrl Reg. File IRwb Mem Ctrl IRmem Ex Ctrl IRex A Exec Dcd Ctrl IR Reg File PC Next PC Inst. Mem Valid CS 61C L19 Pipelining II (10) A Carle, Summer 2005 © UCB Pipelined Control IR <- Mem[PC]; PC <– PC+4; A <- R[rs]; B<– R[rt] S <– A + SX; M <– Mem[S] Mem[S] <- B If Cond PC < PC+SX; S M D CS 61C L19 Pipelining II (11) Mem Access B Data Mem A Reg. File Equal R[rd] <– M; IR Inst. Mem PC R[rt] <– S; Next PC R[rd] <– S; S <– A + SX; Exec S <– A or ZX; Reg File S <– A + B; A Carle, Summer 2005 © UCB Data Stationary Control • The Main Control generates the control signals during Reg/Dec • Control signals for Exec (ExtOp, ALUSrc, ...) are used 1 cycle later • Control signals for Mem (MemWr Branch) are used 2 cycles later • Control signals for Wr (MemtoReg MemWr) are used 3 cycles later Reg/Dec ALUSrc ALUSrc ALUOp ALUOp RegDst MemWr Branch MemtoReg RegWr CS 61C L19 Pipelining II (12) RegDst MemWr Branch MemtoReg RegWr MemWr Branch MemtoReg RegWr Wr Mem/Wr Register ExtOp Mem Ex/Mem Register ExtOp ID/Ex Register IF/ID Register Main Control Exec MemtoReg RegWr A Carle, Summer 2005 © UCB Let’s Try it Out 10 lw 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 28 ori r8, r9, 17 32 add r10, r11, r12 100 and r13, r14, 15 CS 61C L19 Pipelining II (13) r1, 36(r2) A Carle, Summer 2005 © UCB Start: Fetch 10 n WB Ctrl rs rt im A Exec Mem Ctrl Reg File IR n S M Reg. File n Decode Inst. Mem n = 10 PC Next PC D Mem Acces s Data Mem B IF 10 lw r1, 36(r2) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and CS 61C L19 Pipelining II (14) r13, r14, 15 A Carle, Summer 2005 © UCB n WB Ctrl im A S M Reg. File rt Exec 2 n Mem Ctrl Reg File IR n Decode lw r1, 36(r2) Inst. Mem Fetch 14, Decode 10 = 14 PC Next PC D Mem Acces s Data Mem B ID 10 lw r1, 36(r2) IF 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and CS 61C L19 Pipelining II (15) r13, r14, 15 A Carle, Summer 2005 © UCB n WB Ctrl S M Reg. File r2 2 n Mem Ctrl Exec 36 lw r1 rt Reg File IR Decode addI r2, r2, 3 Inst. Mem Fetch 20, Decode 14, Exec 10 = 20 PC Next PC D Mem Acces s Data Mem B EX 10 lw r1, 36(r2) ID 14 addI r2, r2, 3 IF 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and CS 61C L19 Pipelining II (16) r13, r14, 15 A Carle, Summer 2005 © UCB D Reg. File M Mem Acces s Data Mem lw r1 WB Ctrl Mem Ctrl r2+36 Exec 3 r2 4 n addI r2, r2, 3 5 Reg File IR Decode sub r3, r4, r5 Inst. Mem Fetch 24, Decode 20, Exec 14, Mem 10 B 24 PC Next PC = M 10 lw EX 14 addI r2, r2, 3 ID 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 IF r1, 36(r2) 100 and CS 61C L19 Pipelining II (17) r13, r14, 15 A Carle, Summer 2005 © UCB Reg. File M[r2+36] D Mem Acces s Data Mem WB Ctrl r2+3 r4 lw r1 addI r2 sub r3 Mem Ctrl 7 Exec 6 Reg File IR Decode beq r6, r7 100 Inst. Mem Fetch 30, Dcd 24, Ex 20, Mem 14, WB 10 r5 30 PC Next PC = Note Delayed Branch: always execute ori after beq CS 61C L19 Pipelining II (18) WB 10 M 14 lw r1, 36(r2) addI r2, r2, 3 EX 20 ID 24 sub r3, r4, r5 beq r6, r7, 100 IF 30 ori r8, r9, 17 add r10, r11, r12 34 100 and r13, r14, 15 A Carle, Summer 2005 © UCB r1=M[r2+35] WB Ctrl Reg. File addI r2 Mem Ctrl r2+3 sub r3 r4-r5 r6 9 Exec 100 beq xx Reg File IR Decode ori r8, r9 17 Inst. Mem Fetch 100, Dcd 30, Ex 24, Mem 20, WB 14 = 100 PC Next PC D Mem Acces s Data Mem r7 10 WB 14 M 20 lw r1, 36(r2) addI r2, r2, 3 sub r3, r4, r5 EX 24 ID 30 beq r6, r7, 100 ori r8, r9, 17 34 add r10, r11, r12 IF 100 and r13, r14, 15 CS 61C L19 Pipelining II (19) A Carle, Summer 2005 © UCB ???? • Remember: D B Reg. File M Mem Acces s Data Mem PC Next PC Equal IRmem WB Ctrl Ex Ctrl Exec S IRwb IRex A Mem Ctrl Dcd Ctrl Reg File IR Inst. Mem Valid means triggered on edge. • What is wrong here? CS 61C L19 Pipelining II (20) A Carle, Summer 2005 © UCB Double-Clocked Signals D B Reg. File M Mem Acces s Data Mem PC Next PC Equal IRmem WB Ctrl Ex Ctrl Exec S IRwb IRex A Mem Ctrl Dcd Ctrl Reg File IR Inst. Mem Valid • Some signals are double clocked! • In general: Inputs to edge components are their own pipeline regs • Watch out for stalls and such! CS 61C L19 Pipelining II (21) A Carle, Summer 2005 © UCB Administrivia • Proj 2 – Due Sunday • HW6 – Due Tuesday • Midterm 2: • Friday, July 29: 11:00 – 2:00 • Location TBD • If you are really so concerned about the drop deadline that this is a problem for you, talk to me about the possibility of taking the exam on Thursday CS 61C L19 Pipelining II (22) A Carle, Summer 2005 © UCB Megahertz Myth? CS 61C L19 Pipelining II (23) A Carle, Summer 2005 © UCB Outline • Pipeline Control • Forwarding Control • Hazard Control CS 61C L19 Pipelining II (24) A Carle, Summer 2005 © UCB Review: Forwarding Fix by Forwarding result as soon as we have it to where we need it: Reg Reg D$ Reg I$ Reg D$ Reg I$ Reg D$ Reg I$ Reg ALU xor $t9,$t0,$t10 I$ D$ ALU or $t7,$t0,$t8 * Reg WB ALU and $t5,$t0,$t6 I$ EX MEM ALU sub $t4,$t0,$t3 ID/RF ALU add $t0,$t1,$t2 IF D$ Reg * “or” hazard solved by register hardware CS 61C L19 Pipelining II (25) A Carle, Summer 2005 © UCB Forwarding In general: • For each stage i that has reg inputs • For each stage j after I that has reg output - If i.reg == j.reg forward j value back to i. - Some exceptions ($0, invalid) In particular: • ALUinput (ALUResult, MemResult) • MemInput (MemResult) CS 61C L19 Pipelining II (26) A Carle, Summer 2005 © UCB Pending Writes In Pipeline Registers IAU npc I mem Regs B op rw rs rt A im PC n alu S n D mem m n Regs CS 61C L19 Pipelining II (27) A Carle, Summer 2005 © UCB Pending Writes In Pipeline Registers IAU npc • Current operand registers I mem • Pending writes Regs B op rw rs rt A im n op rw alu S n op rw • hazard <= ((rs == rwex) & regWex) OR ((rs == rwmem) & regWme) OR ((rs == rwwb) & regWwb) OR ((rt == rwex) & regWex) OR ((rt == rwmem) & regWme) OR ((rt == rwwb) & regWwb) D mem m PC n op rw Regs CS 61C L19 Pipelining II (28) A Carle, Summer 2005 © UCB Forwarding Muxes IAU npc I mem Regs op rw rs rt Forward mux B A im n op rw alu S n op rw D mem m n op rw PC • Detect nearest valid write op operand register and forward into op latches, bypassing remainder of the pipe • Increase muxes to add paths from pipeline registers • Data Forwarding = Data Bypassing Regs CS 61C L19 Pipelining II (29) A Carle, Summer 2005 © UCB What about memory operations? Tricky situation: MIPS: lw 0($t0) sw 0($t1) RTL: R1 <- Mem[ R2 + I ]; Mem[R3+34] <- R1 op Rd Ra Rb op Rd Ra Rb Rd A D B R Mem Rd T to reg file CS 61C L19 Pipelining II (30) A Carle, Summer 2005 © UCB What about memory operations? Tricky situation: MIPS: lw 0($t0) sw 0($t1) RTL: R1 <- Mem[ R2 + I ]; Mem[R3+34] <- R1 op Rd Ra Rb op Rd Ra Rb Rd A D B R Mem Solution: Handle with bypass in memory stage! CS 61C L19 Pipelining II (31) Rd T to reg file A Carle, Summer 2005 © UCB Outline • Pipeline Control • Forwarding Control • Hazard Control CS 61C L19 Pipelining II (32) A Carle, Summer 2005 © UCB Data Hazard: Loads (1/4) • Forwarding works if value is available (but not written back) before it is needed. But consider … I$ Reg I$ sub $t3,$t0,$t2 EX MEM WB D$ Reg Reg ALU ID/RF ALU lw $t0,0($t1) IF D$ Reg •Need result before it is calculated! •Must stall use (sub) 1 cycle and then forward. … CS 61C L19 Pipelining II (33) A Carle, Summer 2005 © UCB Data Hazard: Loads (2/4) • Hardware must stall pipeline • Called “interlock” CS 61C L19 Pipelining II (34) I$ WB D$ Reg Reg bub ble I$ D$ bub ble Reg D$ bub ble I$ Reg ALU or $t7,$t0,$t6 Reg MEM ALU and $t5,$t0,$t4 I$ EX ALU sub $t3,$t0,$t2 ID/RF ALU lw $t0, 0($t1) IF Reg Reg D$ A Carle, Summer 2005 © UCB Data Hazard: Loads (3/4) • Instruction slot after a load is called “load delay slot” • If that instruction uses the result of the load, then the hardware interlock will stall it for one cycle. • If the compiler puts an unrelated instruction in that slot, then no stall • Letting the hardware stall the instruction in the delay slot is equivalent to putting a nop in the slot (except the latter uses more code space) CS 61C L19 Pipelining II (35) A Carle, Summer 2005 © UCB Data Hazard: Loads (4/4) • Stall is equivalent to nop or $t7,$t0,$t6 CS 61C L19 Pipelining II (36) bub bub ble ble bub ble bub ble bub ble I$ Reg D$ Reg I$ Reg D$ Reg I$ Reg ALU and $t5,$t0,$t4 Reg ALU sub $t3,$t0,$t2 D$ ALU nop I$ ALU lw $t0, 0($t1) D$ Reg A Carle, Summer 2005 © UCB Hazards / Stalling In general: • For each stage i that has reg inputs • If I’s reg is being written later on in the pipe but is not ready yet - Stages 0 to i: Stall (Turn CEs off so no change) - Stage i+1: Make a bubble (do nothing) - Stages i+2 onward: As usual In particular: • ALUinput (MemResult) CS 61C L19 Pipelining II (37) A Carle, Summer 2005 © UCB Hazards / Stalling Alternative Approach: • Detect non-forwarding hazards in decode • Possible since our hazards are formal. - Not always the case. • Stalling then becomes: - Issue nop to EX stage - Turn off nextPC update (refetch same inst) - Turn off InstReg update (re-decode same inst) CS 61C L19 Pipelining II (38) A Carle, Summer 2005 © UCB Stall Logic • 1. Detect nonresolving hazards. IAU npc I mem Regs op rw rs rt Forward mux B A im n op rw PC • 2a. Insert Bubble • 2b. Stall nextPC, IF/DE alu S n op rw D mem m n op rw Regs CS 61C L19 Pipelining II (39) A Carle, Summer 2005 © UCB Stall Logic • Stall-on-issue is used quite a bit • More complex processors: many cases that stall on issue. • More complex processors: cases that can’t be detected at decode - E.g. value needed from mem is not in cache – proc must stall multiple cycles CS 61C L19 Pipelining II (40) A Carle, Summer 2005 © UCB By the way … • Notice that our forwarding and stall logic is stateless! • Big Idea: Keep it simple! • Option 1: Store old fetched inst in reg (“stall_temp”), keep state reg that says whether to use stall_temp or value coming off inst mem. • Option 2: Re-fetch old value by turning off PC update. CS 61C L19 Pipelining II (41) A Carle, Summer 2005 © UCB