Computer Architecture: A Constructive Approach Branch Direction Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 23, 2012 http://csg.csail.mit.edu/6.S078 L19-1 NA pred with decode feedback Fetch x f Reg Read Decode Execute Memory Writeback d f F f r D d r R r r X x r M m r W Next Address Prediction April 23, 2012 http://csg.csail.mit.edu/6.S078 L19-2 Decode detected mispredicts Non-branch When nextPC != PC+4 => use PC+4 Unconditional target known at decode When nextPC != known target => use known target Conditional branch When nextPC != PC+4 or decoded target => use PC+4 Can we do better than PC+4? April 23, 2012 http://csg.csail.mit.edu/6.S078 L19-3 Dynamic Branch Prediction Branch direction prediction: Learn and predict the direction a branch will go Standard prediction principles: Temporal correlation The way a branch resolves may be a good predictor of the way it will resolve at the next execution Spatial correlation Several branches may resolve in a highly correlated manner (a preferred path of execution) April 23, 2012 http://csg.csail.mit.edu/6.S078 L12-4 One-bit predictor Predict branch will go same direction it went last time Fetch PC 00 k Fetch I-Cache BHT Index 2k-entry BHT, 1 bits/entry Instruction Opcode + Decode Branch? April 23, 2012 offset Target PC http://csg.csail.mit.edu/6.S078 Taken/¬Taken? L19-5 One-bit predictor // Interface interface DirectionPred; method ActionValue#(Tuple2#(Bool, DirInfo)) predict(Addr addr); method Action train(DirInfo dirInfo, Bool taken); endinterface // Feedback information typedef 64 BPRows; typedef Bit#(TLog#(BPRows)) DirLineIndex; typedef DirLineIndex DirInfo; April 23, 2012 http://csg.csail.mit.edu/6.S078 L19-6 One-bit predictor (continued) module mkDirectionPredictor(DirectionPred); RegFile#(DirLineIndex, Bool) dirArray <- mkRegFileFull(); method ActionValue#(Tuple2#(Bool, DirInfo)) predict(Addr addr); Array of prediction bits DirLineIndex index = truncate(addr >> 2); return tuple2(dirArray.sub(index), index); Return prediction endmethod saved in array method Action train(DirInfo dirInfo, Bool taken); DirLineIndex index = dirInfo; dirArray.upd(index, taken); Update array endmethod with last actual endmodule behavior When should we train? April 23, 2012 http://csg.csail.mit.edu/6.S078 L19-7 Two-bit Predictor Smith, 1981 How well does one-bit predictor do on short trip count loops? • Assume 2 direction prediction bits per instruction On taken On ¬taken 1 1 Strongly taken 1 0 Weakly taken 0 1 Weakly ¬taken 0 0 Strongly ¬taken Implement using saturating counter April 23, 2012 http://csg.csail.mit.edu/6.S078 L19-8 Saturating Counter typedef Bit#(2) Counter; function Counter updateCounter(Bool dir, Counter counter); return dir?saturatingInc(counter) :saturatingDec(counter); endfunction function Counter saturatingInc(Counter counter); let plusOne = counter + 1; return (plusOne == 0)?counter:plusOne; endfunction function Counter saturatingDec(Counter counter); return (counter == 0)?0:counter-1; endfunction How do we determine prediction from counter? April 23, 2012 http://csg.csail.mit.edu/6.S078 L19-9 Two-bit predictor Fetch PC 00 k BHT Index 2k-entry BHT, 1 bits/entry Taken/¬Taken? April 23, 2012 http://csg.csail.mit.edu/6.S078 L19-10 Two-bit predictor typedef 64 BPRows; typedef Bit#(TLog#(BPRows)) DirLineIndex; // DirInfo data typedef struct { DirLineIndex index; Counter counter; } DirInfo deriving(Bits, Eq); Feedback state for training module mkDirectionPredictor(DirectionPred); // Direction predictor state RegFile#(DirLineIndex,Counter) cntArray <- mkRegFileFull(); April 23, 2012 http://csg.csail.mit.edu/6.S078 L19-11 Two-bit predictor (continued) method ActionValue#(Tuple2#(Bool, DirInfo)) predict(Addr addr); DirInfo info = ? Training info.index = truncate(addr >> 2); information is info.counter = cntArray.sub(index); index and counter Bool taken = (truncate(counter >> 1) == 1); return tuple2(taken, info); endmethod Prediction is high bit of counter method Action train(DirInfo info, Bool taken); cntArray.upd(info.index, updateCounter(taken, info.counter)); endmethod Train by endmodule updating counter April 23, 2012 http://csg.csail.mit.edu/6.S078 L19-12 Exploiting Spatial Correlation Yeh and Patt, 1992 if (x[i] < 7) then y += 1; if (x[i] < 5) then c -= 4; If first condition false, second condition also false Also works well for short trip count loops. Implemented with a history register, ‘hist’, that records the direction of the last N branches executed by the processor. April 23, 2012 http://csg.csail.mit.edu/6.S078 L19-13 Ghist predictor typedef 64 BPRows; typedef Bit#(TLog#(BPRows)) DirLineIndex; typedef Bit#(2) Counter; // DirInfo data typedef struct { DirLineIndex hist; Counter counter; } DirInfo deriving(Bits, Eq); module mkDirectionPredictor(DirectionPred); // Direction predictor state Reg#(DirLineIndex) hist <- mkReg(0); RegFile#(DirLineIndex,Counter) cntArray <- mkRegFileFull(); April 23, 2012 http://csg.csail.mit.edu/6.S078 L19-14 Global history predictor method ActionValue#(Tuple2#(Bool, DirInfo)) predict(Addr addr); DirInfo info = ?; Calculate feedback info.hist = hist; information info.counter = cntArray.sub(hist); Bit#(1) pred = truncate(info.counter >> 1); hist <= truncate(hist << 1 | zeroExtend(pred)); return tuple2((pred == 1), info); endmethod Shift new prediction into history register How good are predictions while waiting for training? April 23, 2012 http://csg.csail.mit.edu/6.S078 L19-15 Global history predictor method Action train(DirInfo info, Bool taken); counterArray.upd(info.hist, updateCounter(taken, info.counter)); endmethod method Action repair(DirInfo info, Bool taken); hist <= truncate((info.hist << 1) | zeroExtend(pack(taken))); endmethod endmodule April 23, 2012 Restore history to state it would be in after the desired prediction What is the state of ‘hist’ after redirects from decode and execute? http://csg.csail.mit.edu/6.S078 L19-16 NA pred with decode feedback Fetch x f Reg Read Decode Execute Memory Writeback d f F f r D d r R r r X x r M m r W Next Address Prediction Direction Prediction April 23, 2012 http://csg.csail.mit.edu/6.S078 L19-17 Direction prediction recipe Execute Send redirects on mispredicts (unchanged) Send direction prediction training Decode Check if next address matches direction pred Send redirect if different Fetch April 23, 2012 Generate prediction Learn from feedback Accept redirects from later stages http://csg.csail.mit.edu/6.S078 L19-18 Add direction feedback typedef struct { Bool correct; NaInfo naPredInfo; Addr nextAddr; DirInfo dirPredInfo; Bool taken; } Feedback deriving (Bits, Eq); Feedback needs information for training direction predictor FIFOF#(Tuple3#(Epoch,Epoch,Feedback)) decFeedback<-mkFIFOF; FIFOF#(Tuple2#(Epoch,Feedback)) execFeedback <- mkFIFOF; April 23, 2012 http://csg.csail.mit.edu/6.S078 L19-19 Execute (branch analysis) // after executing instruction... let nextEeEpoch = eeEpoch; let cond = execData.execInst.cond; let nextPc = cond?execData.execInst.addr : execData.pc+4; if (nextPC != execData.nextAddrPred) nextEeEpoch += 1; Recall: may have eeEpoch <= newEeEpoch; been set in decode execFeedback.enq(tuple2(nextEeEpoch, Feedback{correct: (nextPC == execData.nextAddrPred), taken: cond, dirPredInfo: execData.dirPredInfo, naPredInfo: execData.naPredInfo, Always send nextAddr: nextPc})); feedback // enqueue instruction to next stage April 23, 2012 http://csg.csail.mit.edu/6.S078 L19-20 Decode with mispredict detect New exec epoch rule doDecode; let decData = newDecData(fr.first); let correctPath = (decData.execEpoch != deEpoch) ||(decData.decEpoch == ddEpoch); Same dec epoch let instResp = decData.fInst.instResp; let pcPlus4 = decData.pc+4; Determine if epoch of incoming if (correctPath) instruction is on good path begin decData.decInst = decode(instResp, pcPlus4); let target = knownTargetAddr(decData.decInst); let brClass = getBrClass(decData.decInst); let predTarget = decData.nextAddrPred; let predDir = decData.takenPred; April 23, 2012 http://csg.csail.mit.edu/6.S078 L19-21 Decode with mispredict detect let decodedTarget = case (brClass) Calculate target as NonBranch: pcPlus4; best as decode can UncondKnown: target; CondBranch: (predDir?target:pcPlus4); default: decData.nextAddrPred; endcase; Wrong next addr? if (decodedTarget != predTarget) begin decData.decEpoch = decData.decEpoch + 1; New dec epoch decData.nextAddrPred = decodedTarget; Tell exec addr of decFeedback.enq( next instruction! tuple3(decData.execEpoch, decData.decEpoch, Feedback{correct: False, naPredInfo: decData.naPredInfo, nextAddr: decodedTarget, Send feedback dirPredInfo: decData.dirPredInfo, taken: decData.takenPred})); end Enqueue to next dr.enq(decData); end // of correct path stage on correct path April 23, 2012 http://csg.csail.mit.edu/6.S078 L19-22 Decode with mispredict detect else begin // incorrect path decData.decEpoch = ddEpoch; decData.execEpoch = deEpoch; end ddEpoch <= decData.decEpoch; deEpoch <= decData.execEpoch; fr.deq; endrule April 23, 2012 Preserve current epoch if instruction on incorrect path decData.*Epoch have been set properly so we always save them. http://csg.csail.mit.edu/6.S078 L19-23 Handling redirect from execute if (execFeedback.notEmpty) begin match {.execEpoch, .fb} = execFeedback.first; execFeedback.deq; if(!fb.correct) begin dirPred.repair(fb.dirPredInfo, fb.taken); dirPred.train(fb.dirPredInfo, fb.taken); naPred.repair(fb.naPredInfo, fb.nextAddr); naPred.train(fb.naPredInfo, fb.nextAddr); feEpoch <= execEpoch; Train and repair fetchPc <= feedback.nextAddr; on redirect end else begin dirPred.train(fb.dirPredInfo, fb.taken); naPred.train(fb.naPredInfo, fb.nextAddr); enqInst; Just train on end correct prediction end April 23, 2012 http://csg.csail.mit.edu/6.S078 L19-24 Handling redirect from decode else if (decFeedback.notEmpty) begin decFeedback.deq; match {.execEpoch, .decEpoch, .fb} = decFeedback.first; if (execEpoch == feEpoch) begin if (!fb.correct) begin // epoch unchanged fdEpoch <= decEpoch; dirPred.repair(fb.dirPredInfo, fb.taken); naPred.repair(fb.naPredInfo, fb.nextAddr); fetchPc <= feedback.nextAddr; Just repair end never train else // dec feedback on correct prediction on feedback enqInst; from decode end else // dec feedback, but in fetch is in new exec epoch enqInst; else // no feedback enqInst; http://csg.csail.mit.edu/6.S078 L19-25 April 23, 2012