Csci 136 Computer Architecture II – Branch Hazards, Exceptions Xiuzhen Cheng cheng@gwu.edu Announcement Homework assignment #10, Due time – Before class, April 12 Readings: Sections 6.4 – 6.5 Problems: 6.17-6.19, 6.21-6.22, 6.33-6.36, 6.39-6.40 (six of them will be graded. Your TA will give hints in the lab sections.) Project #3 is due on April 10, 2005 Quiz #4: April 12, 2005 Final: Thursday, May 12, 12:40AM-2:40PM Note: you must pass final to pass this course! Review on Data Hazards, Forwarding, Stall When does a data hazard happen? Data dependencies Using forwarding to overcome data hazards Data is available after ALU stage Forwarding conditions Stall the pipeline for load-use instructions Data is available after MEM stage (lw instruction) Hazard detection conditions Why in ID stage? Review on Data Hazards Review on Data Hazards, Forwarding, Stall PC+4 Sign-extend LW and SW Sign-Ext lw $5, 0($15) sw $5, 100($15) lw $5, 0($15) beq$5, $0, Exit sw $5, 100($15) lw $5, 0($15) add $8, $8, $8 sw $5, 100($15) SW is in MEM Stage sw lw Sign-Ext lw sw $5, 0($15) $5, 100($15) EX/MEM MEM/WB.RegWrite and EX/MEM.MemWrite and MEM/WB.RegisterRd = EX/MEM.RegisterRd and MEM/WB.RegisterRD != 0 Data memory SW is In EX Stage sw Sign-Ext ID/EX.MemWrite and MEM/WB.RegWrite and MEM/WB.RegisterRd = ID/EX.RegisterRt and MEM/WB.RegisterRd != 0 lw More Cases lw $15, 0($8) sw $5, 100($15) # load-use, # stall pipeline R-Type followed by sw? The result from R-Type will be saved into memory R-Type will overwrite base register for sw An Example 40: lw $2, 20($1) 44: 48: and or $4, $2, $5 $8, $2, $4 Clock Cycle 1: Clock Cycle 2: Clock Cycle 3: Clock Cycle 4: Clock 1 Lw $2, 20($1) PC+4 44 Sign-extend Clock 1 Clock 2 And $4, $2, $5 Lw $2, 20($1) 11 010 PC+4 0001 44 48 $1 Sign-extend 20 1 2 2 Clock 2 Clock 3 Or $8, $2, $4 PC+4 And $4, $2, $5 Lw $2, 20($1) 10 11 000 010 1100 44 52 $1 $2 $5 20 Sign-extend 2 5 5 4 Clock 3 1 2 2 Clock 4 Or $8, $2, $4 PC+4 And $4, $2, $5 Bubble 10 00 000 000 1100 44 52 $2 $5 Sign-extend 2 5 5 4 Clock 4 Lw $2, 20($1) 11 Clock 5 And $4, $2, $5 Or $8, $2, $4 10 10 000 000 Bubble Lw $2, 20($1) 00 11 1100 44 PC+4 $2 $2 $4 $5 Sign-extend 2 4 4 8 Clock 5 2 5 5 4 4 2 Branch Hazards Control hazard: attempt to make a decision before condition is evaluated Branch Hazards Decision is made here flush flush flush Observations Branch decision does not occur until MEM stage; 3 CCs are wasted. – Current design, non-optimized Is it possible to reduce branch delay? YES In EXE stage? Two CCs branch delay In ID Stage? One CC branch delay How? – for beq $x, $y, label, $x xor $y then or all bits, much faster than ALU operation. Also we have a separate ALU to compute branch address. 3 strategies Delayed branch; Static branch prediction; Dynamic branch Prediction Delayed Branch Will always execute the instruction following the branch. Only one will be executed Done by compiler or assembler 50% successful rate Losing popularity Why? More pipeline stages Superscalar Scheduling the Branch Delay Slot Independent instruction, best choice B is good when branch taking probability is high. It must be OK to execute the sub instruction when the branch goes to the unexpected direction Static Branch Prediction Assume the branch will not be taken; If prediction is wrong, clear the effect of sequential instruction execution. How to discard instructions in the pipeline? Branch decision is made at MEM stage: instructions in IF, ID, EX stages need to be discarded. Branch decision is made at ID stage: only flush IF/ID pipeline register! Static Branch Prediction Decision is made here flush flush flush Static Branch Prediction IF.Flush Pipelined Branch – An Example 44: 40: 36: 28 44 72 $4 $8 10 IF.Flush Pipelined Branch – An Example 72: Dynamic Branch Prediction Static branch prediction is crude! Take history into consideration If a branch was taken last time, then fetching the new instruction from the same place Branch prediction buffer – indexed by the lower bits of the branch instruction This memory contains a bit (or bits) which tells whether the branch was recently taken or not Is the prediction correct? Any bad effect? taken 1-bit prediction scheme 2-bit prediction scheme Not taken Prediction Taken taken taken Prediction Taken Not taken taken Prediction not Taken Prediction not Taken Not taken Not taken Observation Since we move branch prediction to the ID stage, we need to copy forwarding control related hardware to the ID stage too! Beq following lw Hazard detection unit should work. In-Class Exercise Consider a loop branch that branches nine times in a row, then is not taken once. What is the prediction accuracy for this branch, assuming the prediction bit for this branch remains in the prediction buffer? 1-bit prediction? With 2-bit prediction? taken Not taken Prediction Taken taken taken Prediction Taken Not taken taken Prediction not Taken Prediction not Taken Not taken Not taken Performance Comparision Compare the performance of single-cycle, multi-cycle and pipelined datapath 200ps for memory access, 100ps for ALU operation, 50ps for register file access 25% loads, 10% stores, 11% branches, 2% jumps, 52% ALU ops For piplelined datapath, 50% of load are immediately followed an instruction that uses the result Branch delay on misprediction is 1 clock cycle and 25% branches are mispredicted Jump delay is 1 clock cycle Exceptions Exceptions: events other than branch or jump that change the normal flow of instruction Arithmetic overflow, undefined instruction, etc Internal of the processor Interrupts from external – IO interrupts Use arithmetic overflow as an example When an overflow is detected, we need to transfer control to the exception handling routine at location 0x 8000 0180 immediately because we do not want this invalid value to contaminate other registers or memory locations Similar idea as branch hazard Detected in the EX stage De-assert all control signals in EX and ID stages, flush IF/ID Exceptions 80000180 Example sub and or add slt lw $11, $2, $4 $12, $2, $5 $13, $2, $6 $1, $2, $1 $15, $6, $7 $16, 50($7) -- overflow occurs Exceptions handling routine: 0x 8000 0180 sw $25, 1000($0) 0x 8000 0184 sw $26, 1004($0) Example 80000180 Clock 6 Example 80000180 Clock 7 Questions?