COSC 6385 Computer Architecture - Tomasulo’s Algorithm (II) Edgar Gabriel Spring 2012 Edgar Gabriel Data fields for reservation stations Op:operation to perform on source operands S1 and S2 Qj, Qk: reservation stations producing the operands Vj, Vk: value for each operand A: holds information for memory address calculation (immediate field, effective address) • Busy: indicates occupied functional units/reservation stations • • • • • Qi: number of the reservation station who will produce the data to be stored in this register COSC 6385 – Computer Architecture Edgar Gabriel 1 Detailed steps • Lets look at the details for an operation OP rd, rs, rt (e.g. ADD.D F6, F2, F0) • Assume, that – Operation has been assigned to reservation station r – RS[r] is the data structure holding all the fields for reservation station r, as described in the last lecture – RegisterStat[rs] is the data structure holding the status of register rs (e.g. whether a reservation station will write the register) – Regs[rs] is the register rs in the register file COSC 6385 – Computer Architecture Edgar Gabriel Detailed steps (II) Instruction state Issue Wait until Station r empty FP operation Action / bookkeeping if ( RegisterStat[rs].Qi != 0 ){ RS[r].Qj = RegisterStat[rs].Qi; } else { RS[r].Qj = 0; RS[r].Vj = Regs[rs]; } if ( RegisterStat[rt].Qi != 0 ){ RS[r].Qk = RegisterStat[rt].Qi; } else { RS[r].Qk = 0; RS[r].Vk = Regs[rt]; } RS[r].Busy = yes; RegisterStat[rd].Qi = r; COSC 6385 – Computer Architecture Edgar Gabriel 2 Detailed steps (III) Instruction state Execute FP operation Write result FP operation Wait until RS[r].Qj==0 && RS[r].Qk==0 Action / bookkeeping /* compute result using Vj and Vk */ ∀x Execution complete and CDB available : if ( RegisterStat[x].Qi == r) { Regs[x] = result; RegisterStat[x].Qi = 0; } ∀x : if ( RS[x].Qj == r ) { RS[x].Vj = result; RS[x].Qj = 0; } ∀x : if ( RS[x].Qk == r ) { RS[x].Vk = result; RS[x].Qk = 0; } RS[r].Busy = no; COSC 6385 – Computer Architecture Edgar Gabriel Detailed steps (IV) For a LOAD operation, e.g. LD rt, imm(rs) Instruction state Issue Wait until Buffer r empty Load Action / bookkeeping if ( RegisterStat[rs].Qi != 0 ){ RS[r].Qj = RegisterStat[rs].Qi; } else { RS[r].Qj = 0; RS[r].Vj = Regs[rs]; } RS[r].A = imm; RS[r].Busy = yes; RegisterStat[rt].Qi = r; COSC 6385 – Computer Architecture Edgar Gabriel 3 Detailed steps (V) Instruction state Execute Load step1 Load step 2 Write result Load Wait until RS[r].Qj==0 && r is head of load queue Load step 1 complete Execution complete and CDB available Action / bookkeeping RS[r].A = RS[r].Vj + RS[r].A Read from Mem[RS[r].A] ∀x : if ( RegisterStat[x].Qi == r) { Regs[x] = result; RegisterStat[x].Qi = 0; } ∀x : if ( RS[x].Qj == r ) { RS[x].Vj = result; RS[x].Qj = 0; } ∀x : if ( RS[x].Qk == r ) { RS[x].Vk = result; RS[x].Qk = 0; } RS[r].Busy = no; COSC 6385 – Computer Architecture Edgar Gabriel Dynamic branch prediction (I) • In Tomasulo’s algorithm, no instruction is allowed to initiate execution until all branches preceding the instruction have completed • Up to now, we used four techniques to avoid branch hazards – Stall – Predict not taken – Predict taken – Delayed branch • All methods are static -> do not take the previous behavior of branches into account COSC 6385 – Computer Architecture Edgar Gabriel 4 Dynamic branch prediction (II) • Seven techniques for dynamic branch prediction – – – – – – 1bit branch prediction buffer 2bit branch prediction buffer Correlating Branch Prediction Buffer Branch Target Buffer (Integrated Instruction Fetch Units) Return Address Predictors COSC 6385 – Computer Architecture Edgar Gabriel 1bit Branch prediction buffer (I) • Branch prediction buffer: – Small memory area indexed by the lower portion of the address of the branch instruction – Records whether the branch was taken the last time or not (1 bit is sufficient) • Please note: – Several branches might share the same address since we do not use the full branch instruction address for accessing the branch prediction buffer COSC 6385 – Computer Architecture Edgar Gabriel 5 1bit Branch Prediction Buffer (II) • Limitations – Even for a regular loop (embedded in another large loop) the 1bit Branch Prediction Buffer will mispredict at least the first and the last iteration • 1st iteration: the bit has been set by the last iteration of the same loop to ‘not-taken’, but the branch will be taken • Last iteration: the bit says ‘taken’, but the branch won’t be taken COSC 6385 – Computer Architecture Edgar Gabriel 2bit Branch Prediction Buffer • A prediction must miss twice before the prediction is changed – Can be extended to n-bits Taken Predict taken 11 Not taken Taken Taken Predict not taken 01 Predict taken 10 Not taken Not taken Taken Predict not taken 00 COSC 6385 – Computer Architecture Edgar Gabriel 6 Correlated branches • For a (1,1) predictor: each branch has two different branch prediction buffers: Predictor used in case the previous branch in the application has not been taken Predictor used in case the previous branch in the application has been taken X / Y • The content of the two branch prediction buffers are determined by the branch to which they belong • Which of the two branch prediction buffers are used is depending on the outcome of the previous branch in the application COSC 6385 – Computer Architecture Edgar Gabriel Correlated branches - example if ( d==0 ) d = 1; if ( d==1 ) … BNEZ R1, L1 DADDIU R1, R0, #1 DADDIU R3, R1, #-1 BNEZ R3, L2 L1: !branch b1 !branch b2 … L2: Initial value of d d==0? b1 Value of d before b2 d==1? b2 2 No Taken 2 No Taken 0 Yes Not taken 1 Yes Not taken 2 No Taken 2 No Taken 0 Yes Not taken 1 Yes Not taken COSC 6385 – Computer Architecture Edgar Gabriel 7 Correlated branches - example d=? BPB b1 2 NT/NT b1 act. BPB b2 B2 act. NT/NT • the branch prediction buffers for the branches b1 and b2 are assumed to hold the prediction ‘Not taken’ for both option (previous branch not taken/taken) COSC 6385 – Computer Architecture Edgar Gabriel Correlated branches - example d=? BPB b1 2 NT/NT b1 act. BPB b2 B2 act. NT/NT • assuming BPB for b1 uses the ‘Not Taken’ predictor because the previous branch in the application has not been taken → BPB for b1 predicts that b1 will not be taken COSC 6385 – Computer Architecture Edgar Gabriel 8 Correlated branches - example d=? BPB b1 b1 act. BPB b2 2 NT/NT T NT/NT B2 act. → BPB for b1 predicts that b1 will not be taken → b1 is taken (see table for d=2) Initial value of d d==0? b1 Value of d before b2 d==1? b2 2 No Taken 2 No Taken 0 Yes Not taken 1 Yes Not taken COSC 6385 – Computer Architecture Edgar Gabriel Correlated branches - example d=? BPB b1 b1 act. BPB b2 2 NT/NT T NT/NT B2 act. T/NT → updating the ‘Previous branch has not been taken’ part of BPB for b1 to Taken → because b1 has been taken, the ‘last branch has been taken’ part of BPB b2 will be used → BPB b2 predicts, that b2 will not be taken COSC 6385 – Computer Architecture Edgar Gabriel 9 Correlated branches - example d=? BPB b1 b1 act. BPB b2 B2 act. 2 NT/NT T NT/NT T T/NT NT/T → b2 is taken (see table for d=2) → updating the ‘Previous branch has been taken’ part of BPB for b2 to Taken → because b2 has been taken, the ‘last branch has been taken’ part of BPB b1 will be used taken d==0?that b1 will b1 not be Value of d → BPBvalue b1 predicts, Initial of d d==1? b2 before b2 2 No COSC 6385 – Computer Architecture Edgar Gabriel 0 Yes Taken 2 No Taken Not taken 1 Yes Not taken Correlated branches - example d=? BPB b1 b1 act. BPB b2 B2 act. 2 NT/NT T NT/NT T 0 T/NT NT NT/T → b1 is not taken (see table for d=0) → matches prediction! → update of BPB b1 does not modify any entry taken → because b1 has not been taken, the ‘last branch has not been taken’ part of BPB b2 will be used → BPB b2 predicts that b2 will not be taken Initial value of d d==0? b1 Value of d before b2 d==1? b2 2 No Taken 2 No Taken Not taken 1 Yes Not taken COSC 6385 – Computer Architecture Edgar Gabriel 0 Yes 10 Correlated branches • A (2,1) correlated branch predictor – Uses the behavior of the last 2 branches to choose from 22 different predictions – Uses a 1 bit predictor for each of the 4 prediction buffers Predictor used in case the previous 2 branches in the application have both not been taken (00) Predictor used in case the previous branches have the history :second last branch not taken, last branch taken (01) Predictor used in case the previous branches have the history: second last branch taken, last branch not taken (10) Predictor used in case the previous 2 branches in the application have both been taken (11) A / B / C / D COSC 6385 – Computer Architecture Edgar Gabriel Correlated branches • How do we know which of the four sections of our branch predictor to use – Need to record the behavior of all branches in the application Initial value of d d==0? b1 Value of d before b2 d==1? b2 2 No Taken 2 No Taken 0 Yes Not taken 1 Yes Not taken 2 No Taken 2 No Taken 0 Yes Not taken 1 Yes Not taken • e.g. 11001100110011 COSC 6385 – Computer Architecture Edgar Gabriel 11 Correlated branches • For a (2,n) branch predictor, the last two branches are relevant 2-bit global branch history 11 (implemented using a 2bit shift register) 110 1100 11001 110011 1100110 11001100 COSC 6385 – Computer Architecture Edgar Gabriel Correlated Branches Idea: taken/not taken of recently executed branches is related to behavior of next branch (as well as the history of that branch behavior) – Then behavior of recent branches selects between, say, 4 predictions of next branch, updating just that prediction • (2,2) predictor: 2-bit global, 2-bit local COSC 6385 – Computer Architecture Edgar Gabriel Branch address (4 bits) 2-bits per branch local predictors Prediction 2-bit global branch history (01 = not taken then taken) Slide based on a lecture by David A. Patterson, University of California, Berkley http://www.cs.berkeley.edu/~pattrsn/252S01 12 Accuracy of Different Schemes 20% 18% 4096 Entries 2-bit BHT Unlimited Entries 2-bit BHT 1024 Entries (2,2) BHT 18% 16% 14% Frequency of Mispredictions 12% 11% 10% 8% 6% 6% 6% 6% 5% 5% 4% 4% 2% 0% 1% 1% 0% 0% 4,096 entries: 2-bits per entry Unlimited entries: 2-bits/entry 1,024 entries (2,2) Slide based on a lecture by David A. Patterson, University of California, Berkley http://www.cs.berkeley.edu/~pattrsn/252S01 COSC 6385 – Computer Architecture Edgar Gabriel Branch Target Buffers • Branch Target Buffer (BTB): Address of branch index to get prediction AND branch address (if taken) PC of instruction FETCH Branch PC =? No: branch not predicted, proceed normally (Next PC = PC+4) COSC 6385 – Computer Architecture Edgar Gabriel Predicted PC Yes: instruction is branch and use predicted PC as next PC Extra prediction state bits Slide based on a lecture by David A. Patterson, University of California, Berkley http://www.cs.berkeley.edu/~pattrsn/252S01 13 Need Address at Same Time as Prediction (II) Send PC to memory and branch target buffer (BTB) No Yes Entry found in BTB? Send out predicted PC Is instruction a taken branch? Yes No Normal execution Enter branch address and next PC count into BTB No Yes Taken branch? Mispredicted branch, kill fetched instruction Branch correctly predicted COSC 6385 – Computer Architecture Edgar Gabriel Special Case Return Addresses • Register Indirect branch hard to predict address • SPEC89 85% such branches for procedure return • Save return address in small buffer that acts like a stack: 8 to 16 entries has small miss rate COSC 6385 – Computer Architecture Edgar Gabriel Slide based on a lecture by David A. Patterson, University of California, Berkley http://www.cs.berkeley.edu/~pattrsn/252S01 14 Hardware based speculation • Branch prediction reduces direct stalls of branches • Instructions can be issued using dynamic branch prediction, but could not be executed until the branch outcome was known • Speculative executions extends the concept of dynamic scheduling – Speculates on the outcome of the branch – Executes the following instructions • Requires the ability to undo instructions in case the prediction was wrong. COSC 6385 – Computer Architecture Edgar Gabriel Hardware based speculation (II) • Extending Tomasulo’s algorithm to support speculation: – Separate the step of bypassing results among instructions from the completion of the instruction – Add another step • Issue • Execute • Write result • Commit – Instruction execute out-of-order but commit in-order – Additional set of hardware buffers to hold the results of instructions which have not yet been committed: Reorder buffer (ROB) COSC 6385 – Computer Architecture Edgar Gabriel 15 Reorder Buffers • Hold the results of instructions between the time an instruction finishes and the time the instruction is being committed • Acts as additional reservation stations – ROB can be the source of operands of other instructions • Each ROB contains four fields – Instruction type: branch/store/ALU operation – Destination: Register number or memory address where result should be written – Value: value of the instruction – Ready: instruction completed execution? COSC 6385 – Computer Architecture Edgar Gabriel Four steps of execution (I) • Issue: – Get instruction from instruction queue – Issue instruction if an reservation station is empty and an ROB is available • Execute: – If operands available, execute – New: a store instruction only contains the calculation of the effective address at this point • Write result: – Write result to CDB – Any reservation station/ROB should update – Register file not modified at this point COSC 6385 – Computer Architecture Edgar Gabriel 16 Four steps of execution (II) • Commit: – Normal case (prediction was correct): • instruction reaches head of ROB • Update register file • Remove entry from ROB – Store operation: • Instruction reaches head of ROB • Update of memory location – Incorrect prediction: • When a branch instruction reaches head of ROB and the hardware indicates that the prediction was wrong, ROB is flushed and execution restarted. COSC 6385 – Computer Architecture Edgar Gabriel The same example as for scoreboarding L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Following slides are based on a lecture by Jelena Mirkovic, University of Delaware http://www.cis.udel.edu/~sunshine/courses/F04/CIS662/class12.pdf Assumption: ADD and SUB take 2 clock cycles MULT takes 10 clock cycle DIV takes 40 clock cycles 2 Load/Store, 3 ADD and 2 Mult reservation stations COSC 6385 – Computer Architecture Edgar Gabriel 17 Time=1 Issue first load Instruction status Instruction Issue L.D F6, 34(R2) L.D F2, 45(R3) Execute Write result Commit MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Reservation station Name Load1 Busy Yes Op Vj Vk Load Qj Qk Regs[R2] Dest A #1 34 F12 3 Load2 Add1 Add2 Add3 Mult1 Mult2 Register result status F0 F2 F4 F6 Reorder# COSC 6385 – Computer Architecture Busy Edgar Gabriel Time=1 F8 F10 F30 #1 yes Issue first load Reorder buffer Entry Busy Instruction State Destination 1 Yes L.D F6, 34(R2) Issue F6 Value 2 3 4 5 6 COSC 6385 – Computer Architecture Edgar Gabriel 18 Time=2 first load executes, Second load issues Instruction status Instruction Issue Execute L.D F6, 34(R2) L.D F2, 45(R3) Write result Commit MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Reservation station Name Busy Op Vj Vk Qj Qk Dest A Load1 Yes Load Regs[R2] #1 +34 Load2 Yes Load Regs[R3] #2 45 F12 3 Add1 Add2 Add3 Mult1 Mult2 Register result status F0 F2 Reorder# F4 #2 COSC 6385 – Computer Architecture Busy Edgar Gabriel yes F6 F8 F10 F30 #1 yes Time=2 Reorder buffer Entry Busy Instruction State Destination 1 Yes L.D F6, 34(R2) Execute F6 2 Yes L.D F2, 45(R3) Issue F2 Value 3 4 5 6 COSC 6385 – Computer Architecture Edgar Gabriel 19 Time=3 first load executes, Second load executes, Mul is issued Instruction status Instruction Issue Execute L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 Write result Commit SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Reservation station Name Busy Op Vj Vk Load1 Yes Load Load2 Yes Load Regs[R3] Yes Mult Regs[F4] Qj Qk Dest A #1 Regs[R2]+34 #2 +45 Add1 Add2 Add3 Mult1 #2 #3 Mult2 Register result status Reorder# F0 F2 #3 #2 F4 COSC 6385 – Computer Architecture Busy Edgar Gabriel yes yes F6 F8 F10 F12 3 F30 #1 yes Time=3 Reorder buffer Entry Busy Instruction State Destination 1 Yes L.D F6, 34(R2) Execute F6 2 Yes L.D F2, 45(R3) Executes F2 3 Yes MUL.D F0,F2,F4 Issue F0 Value 4 5 6 COSC 6385 – Computer Architecture Edgar Gabriel 20 Time=4 first load write res., Second load executes, Mul stalled, SUB issued Instruction status Instruction Issue Execute L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 Write result Commit DIV.D F10, F0, F6 ADD.D F6, F8, F2 Reservation station Name Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Yes Load Add1 Yes Sub Mem[34+Regs[R2]] #2 #4 #2 Yes Mult Regs[F4] #2 #3 Regs[R3]+45 Add2 Add3 Mult1 Mult2 Register result status Reorder# F0 F2 #3 #2 F4 COSC 6385 – Computer Architecture Busy Edgar Gabriel yes yes F6 F8 #1 #4 yes yes F10 F12 3 F30 Time=4 Reorder buffer Entry Busy Instruction State Destination Value 1 Yes L.D F6, 34(R2) Write result F6 Mem[34+Regs[R2]] 2 Yes L.D F2, 45(R3) Executes F2 3 Yes MUL.D F0,F2,F4 Stalled in issue F0 4 Yes SUB.D F8, F2, F6 Issue F8 5 6 COSC 6385 – Computer Architecture Edgar Gabriel 21 Time=5first load commits, Second load write res, Mul, Sub stalled, Div issued Instruction status Instruction Issue Execute Write result L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 Commit ADD.D F6, F8, F2 Reservation station Name Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Yes Sub Mem[45+Regs[R3]] Mem[34+Regs[R2]] Mult1 Yes Mult Mem[45+Regs[R3]] Regs[F4] Mult2 Yes Div #4 Add2 Add3 #3 Mem[34+Regs[R2]] #3 #5 Register result status Reorder# F0 F2 #3 #2 F4 F6 COSC 6385 – Computer Architecture Busy Edgar Gabriel yes yes F8 F10 #4 #5 yes Yes F12 3 F30 Time=5 Reorder buffer Entry Busy Instruction State Destination Value 1 no L.D F6, 34(R2) Commit F6 Mem[34+Regs[R2]] 2 Yes L.D F2, 45(R3) Write result F2 Mem[45+Regs[R3]] 3 Yes MUL.D F0,F2,F4 Stalled in issue F0 4 Yes SUB.D F8, F2, F6 Stalled in issue F8 5 Yes DIV.D F10,F0, F6 Issue F10 6 COSC 6385 – Computer Architecture Edgar Gabriel 22 Time=6 second load commits., Mul (1/10), Sub (1/2), Div stalled, Add issued Instruction status Instruction Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Reservation station Name Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Yes Sub Add2 yes Add Mult1 Yes Mult Mult2 Yes Div Mem[45+Regs[R3]] Mem[34+Regs[R2]] Mem[45+Regs[R3]] #4 #4 #6 Add3 Mem[45+Regs[R3]] Regs[F4] #3 Mem[34+Regs[R2]] #3 #5 Register result status F0 Reorder# F2 F4 #3 COSC 6385 – Computer Architecture Busy Edgar Gabriel yes F6 F8 F10 #6 #4 #5 yes yes Yes F12 3 F30 Time=6 Reorder buffer Entry Busy Instruction State Destination Value 1 no L.D F6, 34(R2) Commit F6 Mem[34+Regs[R2]] 2 no L.D F2, 45(R3) Commit F2 Mem[45+Regs[R3]] 3 Yes MUL.D F0,F2,F4 Execute F0 4 Yes SUB.D F8, F2, F6 Execute F8 5 Yes DIV.D F10,F0, F6 Stalled in Issue F10 6 Yes ADD F6, F8, F2 Issue F6 COSC 6385 – Computer Architecture Edgar Gabriel 23 Time=7 Mul (2/10), Sub (2/2), Div stalled, Add stalled Instruction status Instruction Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Reservation station Name Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Yes Sub Add2 yes Add Mult1 Yes Mult Mult2 Yes Div Mem[45+Regs[R3]] Mem[34+Regs[R2]] Mem[45+Regs[R3]] #4 #4 #6 Add3 Mem[45+Regs[R3]] Regs[F4] #3 Mem[34+Regs[R2]] #3 #5 Register result status F0 Reorder# F2 F4 #3 COSC 6385 – Computer Architecture Busy Edgar Gabriel yes F6 F8 F10 #6 #4 #5 yes yes Yes F12 3 F30 Time=7 Reorder buffer Entry Busy Instruction State Destination Value 1 no L.D F6, 34(R2) Commit F6 Mem[34+Regs[R2]] 2 no L.D F2, 45(R3) Commit F2 Mem[45+Regs[R3]] 3 Yes MUL.D F0,F2,F4 Execute F0 4 Yes SUB.D F8, F2, F6 Execute F8 5 Yes DIV.D F10,F0, F6 Stalled in Issue F10 6 Yes ADD F6, F8, F2 Stalled in Issue F6 COSC 6385 – Computer Architecture Edgar Gabriel 24 Time=8 Mul (3/10), Sub write result, Div stalled, Add stalled Instruction status Instruction Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Reservation station Name Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 yes Add X Mem[45+Regs[R3]] Mult1 Yes Mult Mem[45+Regs[R3]] Regs[F4] Mult2 Yes Div #6 Add3 #3 Mem[34+Regs[R2]] #3 #5 Register result status F0 Reorder# F2 F4 #3 COSC 6385 – Computer Architecture Busy Edgar Gabriel yes F6 F8 F10 #6 #4 #5 yes yes Yes F12 3 F30 Time=8 Reorder buffer Entry Busy Instruction State Destination Value 1 no L.D F6, 34(R2) Commit F6 Mem[34+Regs[R2]] 2 no L.D F2, 45(R3) Commit F2 Mem[45+Regs[R3]] 3 Yes MUL.D F0,F2,F4 Execute F0 4 Yes SUB.D F8, F2, F6 Write result F8 5 Yes DIV.D F10,F0, F6 Stalled in Issue F10 6 Yes ADD F6, F8, F2 Stalled in Issue F6 X COSC 6385 – Computer Architecture Edgar Gabriel 25 Time=9 Mul (4/10),Div stalled, Add executes (1/2) Instruction status Instruction Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Reservation station Name Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 yes Add X Mem[45+Regs[R3]] Mult1 Yes Mult Mem[45+Regs[R3]] Regs[F4] Mult2 Yes Div #6 Add3 #3 Mem[34+Regs[R2]] #3 #5 Register result status F0 Reorder# F2 F4 #3 COSC 6385 – Computer Architecture Busy Edgar Gabriel yes F6 F8 F10 #6 #4 #5 yes yes Yes F12 3 F30 Time=9 Reorder buffer Entry Busy Instruction State Destination Value 1 no L.D F6, 34(R2) Commit F6 Mem[34+Regs[R2]] 2 no L.D F2, 45(R3) Commit F2 Mem[45+Regs[R3]] 3 Yes MUL.D F0,F2,F4 Execute F0 4 Yes SUB.D F8, F2, F6 Waiting to commit F8 5 Yes DIV.D F10,F0, F6 Stalled in Issue F10 6 Yes ADD F6, F8, F2 Execute F6 X COSC 6385 – Computer Architecture Edgar Gabriel 26 Time=11 Mul (6/10),Div stalled, Add writes result Instruction status Instruction Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Reservation station Name Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3 Mult1 Yes Mult Mult2 Yes Div Mem[45+Regs[R3]] Regs[F4] #3 Mem[34+Regs[R2]] #3 #5 Register result status F0 Reorder# F2 F4 #3 COSC 6385 – Computer Architecture Busy Edgar Gabriel yes F6 F8 F10 #6 #4 #5 yes yes Yes F12 3 F30 Time=11 Reorder buffer Entry Busy Instruction State Destination Value 1 no L.D F6, 34(R2) Commit F6 Mem[34+Regs[R2]] 2 no L.D F2, 45(R3) Commit F2 Mem[45+Regs[R3]] 3 Yes MUL.D F0,F2,F4 Execute F0 4 Yes SUB.D F8, F2, F6 Waiting to commit F8 5 Yes DIV.D F10,F0, F6 Stalled in Issue F10 6 Yes ADD F6, F8, F2 Write result F6 X Y COSC 6385 – Computer Architecture Edgar Gabriel 27 Time=12 Mul (7/10),Div stalled, Instruction status Instruction Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Reservation station Name Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3 Mult1 Yes Mult Mult2 Yes Div Mem[45+Regs[R3]] Regs[F4] #3 Mem[34+Regs[R2]] #3 #5 Register result status F0 Reorder# F2 F4 #3 COSC 6385 – Computer Architecture Busy Edgar Gabriel yes F6 F8 F10 #6 #4 #5 yes yes Yes F12 3 F30 Time=12 Reorder buffer Entry Busy Instruction State Destination Value 1 no L.D F6, 34(R2) Commit F6 Mem[34+Regs[R2]] 2 no L.D F2, 45(R3) Commit F2 Mem[45+Regs[R3]] 3 Yes MUL.D F0,F2,F4 Execute F0 4 Yes SUB.D F8, F2, F6 Waiting to commit F8 5 Yes DIV.D F10,F0, F6 Stalled in Issue F10 6 Yes ADD F6, F8, F2 Waiting to commit F6 X Y COSC 6385 – Computer Architecture Edgar Gabriel 28 Time=16 Mul writes result, Div stalled Instruction status Instruction Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Reservation station Name Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3 Mult1 Mult2 Yes Div Z Mem[34+Regs[R2]] #5 Register result status F0 Reorder# F2 F4 #3 COSC 6385 – Computer Architecture Busy Edgar Gabriel yes F6 F8 F10 #6 #4 #5 yes yes Yes F12 3 F30 Time=16 Reorder buffer Entry Busy Instruction State Destination Value 1 no L.D F6, 34(R2) Commit F6 Mem[34+Regs[R2]] 2 no L.D F2, 45(R3) Commit F2 Mem[45+Regs[R3]] 3 Yes MUL.D F0,F2,F4 Writing result F0 Z 4 Yes SUB.D F8, F2, F6 Waiting to commit F8 X 5 Yes DIV.D F10,F0, F6 Stalled in Issue F10 6 Yes ADD F6, F8, F2 Waiting to commit F6 Y COSC 6385 – Computer Architecture Edgar Gabriel 29 Mul commits, Div executes (1/40), Time=17 Instruction status Instruction Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Reservation station Name Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3 Mult1 Mult2 Yes Div Z Mem[34+Regs[R2]] #5 Register result status F0 F2 F4 Reorder# COSC 6385 – Computer Architecture Busy Edgar Gabriel F6 F8 F10 #6 #4 #5 yes yes Yes F12 3 F30 Time=17 Reorder buffer Entry Busy Instruction State Destination Value 1 no L.D F6, 34(R2) Commit F6 Mem[34+Regs[R2]] 2 no L.D F2, 45(R3) Commit F2 Mem[45+Regs[R3]] 3 no MUL.D F0,F2,F4 Commits F0 Z 4 Yes SUB.D F8, F2, F6 Waiting to commit F8 X 5 Yes DIV.D F10,F0, F6 Executes F10 6 Yes ADD F6, F8, F2 Waiting to commit F6 Y COSC 6385 – Computer Architecture Edgar Gabriel 30 Time=18 Sub commits, Div executes (2/40), Instruction status Instruction Issue Execute Write result Commit L.D F6, 34(R2) L.D F2, 45(R3) MUL.D F0, F2, F4 SUB.D F8, F6, F2 DIV.D F10, F0, F6 ADD.D F6, F8, F2 Reservation station Name Busy Op Vj Vk Qj Qk Dest A Load1 Load2 Add1 Add2 Add3 Mult1 Mult2 Yes Div Z Mem[34+Regs[R2]] #5 Register result status F0 F2 F4 Reorder# COSC 6385 – Computer Architecture Busy Edgar Gabriel F6 F8 F10 #6 #5 yes Yes F12 3 F30 Time=18 Reorder buffer Entry Busy Instruction State Destination Value 1 no L.D F6, 34(R2) Commit F6 Mem[34+Regs[R2]] 2 no L.D F2, 45(R3) Commit F2 Mem[45+Regs[R3]] 3 no MUL.D F0,F2,F4 Commit F0 Z 4 No SUB.D F8, F2, F6 Commit F8 X 5 Yes DIV.D F10,F0, F6 Executes F10 6 Yes ADD F6, F8, F2 Waiting to commit F6 Y COSC 6385 – Computer Architecture Edgar Gabriel 31 … and so on… • Time 57: DIV writes result • Time 58: DIV commits • Time 59: Add commits COSC 6385 – Computer Architecture Edgar Gabriel 32