Computer Architecture: A Constructive Approach Bypassing Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology March 19, 2012 http://csg.csail.mit.edu/6.S078 L12-1 Six Stage Pipeline Fetch Decode Reg Read Execute Memory Writeback p c F f r D d r R r r X x r M m r W Writeback feeds back directly to ReadReg rather than through FIFO March 19, 2012 http://csg.csail.mit.edu/6.S078 L12-2 Register Read Stage Reg Read Decode D Decode Scoreboard update signaled immediately March 19, 2012 d r R r r W Read regs update scoreboard RF writeback register http://csg.csail.mit.edu/6.S078 L12-3 Per stage architectural state typedef struct { MemResp instResp; } FBundle; typedef struct { Rindx r_dest; Rindx op1; Rindx op2; ... } DecBundle; typedef struct { Data src1; Data src2; } RegBundle; March 19, 2012 typedef struct { Bool cond; Addr addr; Data data; } Ebundle; No MemBundle – data added to Ebundle No WbBundle – nothing added in WB http://csg.csail.mit.edu/6.S078 L12-4 Per stage micro-architectural state typedef struct { FBundle fInst; DecBundle decInst; RegBundle regInst; Inum inum; Epoch epoch; } RegData deriving(Bits,Eq); Architectural state from prior stages New architectural state for this stage Passthrough and (optional) new micro-architectural state Utility function RegData newRegData(DecData decData); function RegData regData = ?; regData.fInst = decData.fInst; regData.decInst = decData.decInst; Copy architectural state from prior stages regData.inum = decData.inum; regData.epoch = decData.epoch; return regData; Copy uarch state endfunction from prior stages In “uarch_types” March 19, 2012 http://csg.csail.mit.edu/6.S078 L12-5 Example of stage state use rule doRegRead; let regData = newRegData(dr.first()); let decInst = regData.decInst; Initialize stage state Set utility variables* case (reg_read.readRegs(decInst)) matches tagged Valid .rfdata: Set begin architectural state if( writes_register(decInst)) reg_read.markUnavail(decInst.r_dest); regData.regInst.src1 = rfdata.src1; regData.regInst.src2 = rfdata.src2; Set uarch rr.enq(regData); state if any dr.deq(); end Enqueue Dequeue outgoing endcase incoming *Don’t try to state state endrule update!!! March 19, 2012 http://csg.csail.mit.edu/6.S078 L12-6 Resource-based diagram key 0 1 Fetch (Fuschia) ζ η Decode (Denim) ε ζ Reg Read (Red) δ Execute (Emerald) γ δ Memory (Mustard) β γ Writeback (Wheat) α α α = stalled March 19, 2012 epoch change (color change) pc redirect δ = killed ε R1 busy Set R1 not busy writeback feedback http://csg.csail.mit.edu/6.S078 L12-7 Scoreboard clear bug! 9 η7waits8for write of R1 by ζ 0 1 2 3 4 5 6 α β γ δ ε ζ η α β γ δ ε ζ η α β γ δ ε ζ α = 00: add r1,r0,r0 β = 04: j 40 γ = 08: add ... δ = 12: add ... ε = 16: add ... ζ = 40: add r1,r0, r0 η = 44: add r2, r1,r0 α β γ α β α F D η η R ζ X M ζ W β Looks okay, but what if α is delayed? March 19, 2012 http://csg.csail.mit.edu/6.S078 L12-8 Scoreboard clear bug! 8 9by η7is released earlier write of R1 0 1 2 3 4 5 6 α β γ δ ε ζ η α β γ δ ε ζ η α β γ δ ε ζ α = 00: ld r1,100(r0) β = 04: j 40 γ = 08: add ... δ = 12: add ... ε = 16: add ... ζ = 40: add r1,r0, r0 η = 44: add r2, r1,r0 α β γ α α α F α Mistreatment of what did of data dependence? March 19, 2012 http://csg.csail.mit.edu/6.S078 D η R ζ η X β ζ M α β W WAW L12-9 So don’t clear scoreboard! 8 9 is ζ7sees register free by feedback, then sets it busy F 0 1 2 3 4 5 6 α β γ δ ε ζ η α β γ δ ε ζ η α β γ δ ε ζ α = 00: ld r1,100(r0) β = 04: j 40 γ = 08: add ... δ = 12: add ... ε = 16: add ... ζ = 40: add r1,r0, r0 η = 44: add r2, r1,r0 α β γ δ ε α α α α D ζ η R ζ X M β α W β Looks okay, but what if R1 written in branch shadow? March 19, 2012 http://csg.csail.mit.edu/6.S078 L12-10 Dead instruction deadlock! ζ7never8sees register R1 0 1 2 3 4 5 6 α β γ δ ε ζ η α β γ δ ε ζ η α β γ δ ε ζ α = 00: add r3,r0,r0 β = 04: j 40 γ = 08: add ... δ = 12: add r1,... ε = 16: add,... ζ = 40: add r1,r0, r0 η = 44: add r2, r1,r0 α β γ δ ε α β α 9 F D ζ ζ R X M β W So, what can we do? March 19, 2012 http://csg.csail.mit.edu/6.S078 L12-11 Poison bit uarch state In exec: instead of dropping killed instructions mark them ‘poisoned’ In subsequent stages: March 19, 2012 DO NOT make architectural changes DO necessary bookkeeping http://csg.csail.mit.edu/6.S078 L12-12 Poison-bit solution Poisoned δ frees register, but7 value8 same, 9 ζ marks it busy. 0 1 2 3 4 5 6 α β γ δ ε ζ η α β γ δ ε ζ η η α β γ δ ε ζ ζ α = 00: add r3,r0,r0 β = 04: j 40 γ = 08: add ... δ = 12: add r1,... ε = 16: add,... ζ = 40: add r1,r0, r0 η = 44: add r2, r1,r0 α β γ δ ε α β γ δ ε α β γ δ F D η R ζ X M ε W What stage generates poison bit? March 19, 2012 http://csg.csail.mit.edu/6.S078 L12-13 Poison-bit generation Initialize stage state rule doExecute; let execData = newExecData(rr.first); let ...; Set utility if (! epochChange) variables begin execData.execInst = exec.exec(decInst,src1,src2); if (execInst.cond) ... end Set architectural else state begin execData.poisoned = True; Set uarch end state Enqueue xr.enq(execData); outgoing rr.deq(); state Dequeue What stages endrule incoming need to handle state March 19, 2012 http://csg.csail.mit.edu/6.S078 poison bit? L12-14 Poison-bit usage Architectural activity if not poisoned rule doMemory; let memData = newMemData(xr.first); let ...; if(! poisoned && memop(decInst)) memData.execInst.data <- mem.port(...); mr.enq(memData); xr.deq(); endrule Unconditionally do rule doWriteBack; bookkeeping let wbData = newWBData(mr.first); let ...; reg_read.markAvailable(rdst); if (!poisoned && regwrite(decInst) Architectural rf.wr(rdst, data); activity if not mr.deq; poisoned endrule March 19, 2012 http://csg.csail.mit.edu/6.S078 L12-15 Data dependence stalls 0 1 ζ η ζ 2 3 4 5 6 8 9 F η ζ D η η η ζ R η ζ ζ = 40: add r1,r0, r0 η = 44: add r2, r1,r0 X M η ζ This is a data dependence. What kind? March 19, 2012 7 http://csg.csail.mit.edu/6.S078 η W RAW L12-16 Bypass RegRead rr Execute What does RegRead need to do? If scoreboard says register is not available then RegRead needs to stall, unless it sees what it needs on the bypass line…. March 19, 2012 http://csg.csail.mit.edu/6.S078 L12-17 Register Read Stage D Decode d r R r r X W Generate bypass Read regs Update scoreboard Bypass Net RF March 19, 2012 Writeback register http://csg.csail.mit.edu/6.S078 L12-18 Bypass Network typedef struct { Rindx regnum; Data value;} BypassValue; module mkBypass(BypassNetwork) Rwire#(BypassValue) bypass; method produceBypass(Rindx regnum, Data value); bypass.wset(BypassValue{regname: regnum, value:value}); endmethod method Maybe#(Data) consumeBypass1(Rindx regnum); if (bypass matches tagged Valid .b && b.regnum == regnum) return tagged Valid b.value; else return tagged Invalid; endmethod endmodule Does bypass solve WAW hazard March 19, 2012 http://csg.csail.mit.edu/6.S078 No! L12-19 Register Read (with bypass) module mkRegRead#(RFile rf, BypassNetwork bypass)(RegRead); method Maybe#(Sources) readRegs(DecBundle decInst); let op1 = decInst.op1; let op2 = decInst.op2; let rfdata1 = rf.rd1(op1); let rfdata2 = rf.rd2(op2); let bypass1 = bypass.consumeBypass1(op1); let bypass2 = bypass.consumeBypass2(op2); let stall = isWAWhazard(scoreboard) || (isBusy(op1,scoreboard) && !isJust(bypass1)) || (isBusy(op2,scoreboard) && !isJust(bypass2))); let s1 = fromMaybe(rfdata1, bypass1); let s2 = fromMaybe(rfdata2, bypass2); if (stall) return tagged Invalid; else return tagged Valid Sources{src1:s1, src2:s2}; endmethod March 19, 2012 http://csg.csail.mit.edu/6.S078 L12-20 Bypass generation in execute rule doExecute; let execData = newExecData(rr.first); let ...; if (! epochChange) begin execData.execInst = exec.exec(decInst,src1,src2); if (execInst.cond) ... if ( decInst.i_type == ALU ) bypass.generateBypass(decInst.r_dst, execInst.data); end else begin execData.poisoned = True; end If have data destined xr.enq(execData); for a register bypass it rr.deq(); back to register read endrule Does bypass solve all RAW delays? March 19, 2012 http://csg.csail.mit.edu/6.S078 No – not for ld/st! L12-21 Full bypassing p c F f r D d r R r r X x r M m r W Every stage that generates register writes can generate bypass data March 19, 2012 http://csg.csail.mit.edu/6.S078 L12-22 Multi-cycle execute R r r X x r M M1 m 1 M2 m 2 M3 Incorporating a multi-cycle operation into a stage March 19, 2012 http://csg.csail.mit.edu/6.S078 L12-23 Multi-cycle function unit module mkMultistage(Multistage); State1 m1 <- mkReg(); State2 m2 <- mkReg(); method Action request(Operand operand); m1.enq(doM1(operand)); endmethod rule s1; m2.enq(doM2(m1.first)); m1.deq(); endrule method ActionValue Result response(); return doM3(m2.first); m2.deq(); endmethod endmodule March 19, 2012 http://csg.csail.mit.edu/6.S078 Perform first stage of pipeline Perform middle stage(s) of pipeline Perform last stage of pipeline and return result L12-24 Single/Multi-cycle pipe stage rule doExec; If not busy ... initialization start multicycle operation, or… let done = False; if (! waiting) begin if(isMulticycle(...)) begin multicycle.request(...); waiting <= True; end else begin result = singlecycle.exec(...); done = True; end end else begin …perform single cycle operation result <- multicycle.response(); done = True; waiting <= False; end if (done) If busy, wait Finish up if ...finish up for response have a result endrule March 19, 2012 http://csg.csail.mit.edu/6.S078 L12-25