EECS 322 Computer Architecture Pipeline Control, Data Hazards and Branch Hazards EECS 322 April 10, 2000 Models Single-cycle model (non-overlapping) • Each instruction executes in a single cycle • Every instruction and clock-cycle must be stretched to the slowest instruction (p.438) Multi-cycle model (non-overlapping) • Each instruction executes in variable number of cycles • The clock-cycle must be stretched to the slowest step • Ability to share functional units within the execution of a single instruction Pipeline model (overlapping) • Each instruction executes in several cycles • The clock-cycle must be stretched to the slowest step • Gains efficiency by overlapping the execution of multiple instructions, increasing hardware utilization. (p. 377) EECS 322 April 10, 2000 Overhead Single-cycle model Chip Area • 8 ns Clock (125 MHz), (non-overlapping) • 1 ALU + 2 adders • 0 Muxes • 0 Datapath Register bits (Flip-Flops) Speed Multi-cycle model • 2 ns Clock (500 MHz), (non-overlapping) • 1 ALU + Controller • 5 Muxes • 160 Datapath Register bits (Flip-Flops) Pipeline model • 2 ns Clock (500 MHz), (overlapping) • 2 ALU + Controller • 4 Muxes • 373 Datapath + 16 Controlpath Register bits (Flip-Flops) EECS 322 April 10, 2000 PCSrc 0 M u x 1 PC IF/ID 32 2W 3M ID/EX EX/MEM PC PC3 2 Add bits 4 2W 3M 4 EX Add result Add MEM/WB 2W 32 M Branch PC Address IR Instruction memory Instruction RegWrite 32 Read register 1 Read data 1 Read register 2 Registers Read Write data 2 register Write data Datapath Registers 160 FFs bits Z MemWrite Figure 6.25 R Zero Zero ALU ALU result 0 M u x 1 Instruction 16 [15– 0] Instruction [20– 16] Sign extend 32 Si 32 RT 5 6 0 M u x 1 RegDst + 16 FFs 1 ALUSrc B 32 D ALU Out 32 Address Data memory ALU control B 32 ALUOp D 5 Read data 32 ALU Out Write data Instruction [15– 11] + 213 FFs A 32 Shift left 2 MemRead MemtoReg 1 M u x 0 32 D 5 RD 5 EECS 322 April 10, 2000 Pipeline Control: Controlpath Register bits WB Instruction Control IF/ID M WB EX M WB ID/EX EX/MEM MEM/WB 9 control bits 5 control bits Figure 6.29 2 control bits EECS 322 April 10, 2000 Pipeline Control: Controlpath Register bits Figure 5.20, Single Cycle Instruction Reg Dst ALU Src Mem Reg Reg Wrt Mem Red Mem Wrt Branch ALU op1 ALU op0 R-format 1 0 0 1 0 0 0 1 0 lw 1 1 1 1 1 0 0 0 0 sw X 1 X 0 0 1 0 0 0 beq X 0 X 0 0 0 1 0 1 ID / EX control lines Figure 6.28 EX / MEM control lines MEM / WB cntrl lines Instruction Reg Dst ALU Op1 ALU Op0 ALU Src Branch Mem Red Mem Wrt Reg Wrt Mem Reg R-format 1 1 0 0 0 0 0 1 0 lw 1 0 0 1 0 1 0 1 1 sw X 0 0 1 0 0 1 0 X beq X 0 1 0 1 0 0 0 X EECS 322 April 10, 2000 Pipeline Hazards Pipeline hazards • Solution #1 always works: stall, delay & procrastinate! Structural Hazards (i.e. fetching same memory bank) • Solution #2: partition architecture Control Hazards (i.e. branching) • Solution #1: stall! but decreases throughput • Solution #2: guess and back-track • Solution #3: delayed decision: delay branch & fill slot Data Hazards (i.e. register dependencies) • Worst case situation • Solution #2: re-order instructions • Solution #3: forwarding or bypassing: delayed load EECS 322 April 10, 2000 Pipeline Datapath and Controlpath PCSrc ID/EX 0 M u x 1 WB Control IF/ID EX/MEM M WB EX M MEM/WB WB Add Add Add result Instruction memory ALUSrc Read register 1 Read data 1 Read register 2 Registers Read Write data 2 register Write data Zero ALU ALU result 0 M u x 1 MemtoReg Address Branch Shift left 2 MemWrite PC Instruction RegWrite 4 Address Data memory Read data Write data Instruction 16 [15– 0] Instruction [20– 16] Instruction [15– 11] Sign extend 32 6 ALU control 0 M u x 1 1 M u x 0 MemRead ALUOp RegDst Figure 6.30 EECS 322 April 10, 2000 Clock 2 WB A=C$1 Instruction Zero ALU ALU result 0 M u x 1 6 ALU control 0 M u x 1 ALUOp RegWrite Branch Address Data memory Read data Write data MemRead D=$10 D=0 Figure 6.30 D=$10 RegDst WB 1 M u x 0 Aluout 32 T=$10 Instruction [15– 11] Sign extend ALUSrc ALU=20+C$1 S=20 lw $10,20($1) Write data Instruction [20– 16] Add Add result B=X IR= Read data 1 Read register 2 Registers Read Write data 2 register Instruction 16 [15– 0] M Shift left 2 Read register 1 MEM/WB MDR=Mem[20+C$1] PC=4 PC=0 Instruction memory WB EX 4 PC=4+20<<2 EX/MEM M IF/ID Address WB=11 PC=4+20<<2 Control PC WB=11 M=010 MemtoReg PC=4 ID/EX 0 M u x 1 Add Clock 3 WB=11, M=010 EX=0001 PCSrc Clock 0 Clock 3 MemWrite load inst. Clock 1 EECS 322 April 10, 2000 Clock 2 WB A=C$1 Instruction Zero ALU ALU result 0 M u x 1 6 ALU control 0 M u x 1 ALUOp RegWrite Branch Address Data memory Read data Write data MemRead D=$10 D=0 Figure 6.30 D=$10 RegDst WB 1 M u x 0 Aluout 32 T=$10 Instruction [15– 11] Sign extend ALUSrc ALU=20+C$1 S=20 lw $10,20($1) Write data Instruction [20– 16] Add Add result B=X IR= Read data 1 Read register 2 Registers Read Write data 2 register Instruction 16 [15– 0] M Shift left 2 Read register 1 MEM/WB MDR=Mem[20+C$1] PC=4 PC=0 Instruction memory WB EX 4 PC=4+20<<2 EX/MEM M IF/ID Address WB=11 PC=4+20<<2 Control PC WB=11 M=010 MemtoReg PC=4 ID/EX 0 M u x 1 Add Clock 3 WB=11, M=010 EX=0001 PCSrc Clock 0 Clock 3 MemWrite load inst. Clock 1 EECS 322 April 10, 2000 Pipeline Datapath and Controlpath Contents of Register 1 = C$1 = 3; C$2=4; C$3=4; C$4=6; C$5=7; C$10=8; … Memory[23]=9; Formats: add $rd,$rs=A,$rt=B; lw $rt=B,@($rs=A) Clock <IF/ID> <ID/EX> <EX/MEM> <MEM/WB> <PC, IR> <PC, A, B, S, Rt, Rd> <PC, Z, ALU, B, R> <MDR, ALU, R> 0 <0,?> <?,?,?,?,?,?> <?,?,?,?,?> <?,?,?> 1 <4,lw $10,20($1)> <0,?,?,?,?,?> <?,?,?,?,?> <?,?,?> 2 <8,sub $11,$2,$3> <4,C$13,C$108,20,$10,0> <0,?,?,?,?> 3 <12,and $12,$4,$5> <8,C$24,C$34,X,$3,$11> <4+20<<284,0,20+323,8,$10><?,?,?> 4 <16,or $13,$6,$7> 5 <20,add $14,$8,$9> <16,C$6 ,C$7,X,$7,$13> <X,0,1,7,$12> <?,?,?> <12,C$46,C$57,X,$5,$12><X,1,4-4=0,4,$11> <Mem[23]9,23,$10> <X,0,$11> EECS 322 April 10, 2000 Clock 1: Figure 6.31a EX: before<2> IF/ID ID/EX 00 Control PC=4 Add 000 0000 WB M EX/MEM Instruction Instruction memory 000 WB 0 00 EX 0 M 00 0 0 0 0 WB 0 Branch Shift left 2 ALUSrc Read register 1 Read data 1 Read register 2 Registers Read Write data 2 register Write data Zero ALU ALU result 0 M u x 1 Address Data memory Read data Write data Instruction [15– 0] Sign extend Instruction [20– 16] Instruction [15– 11] Clock 1 IF: sub $11, $2, $3 MEM/WB Add Add result RegWrite IR=lw $10,20($1) PC=0 Address WB: before<4> 00 4 PC MEM: before<3> MemtoReg 0 M u x 1 ID: before<1> MemWrite IF: lw $10, 20($1) ALU control 0 M u x 1 1 M u x 0 MemRead ALUOp RegDst ID: lw $10, 20($1) EX: before<1> MEM: before<2> EECS 322before<3> April 10, 2000 WB: 1 Clock 1 RegDst C ID/EX lw Control WB 010 0001 RegWrite X WB M 00 0 0 0 0 WB 0 Add Add result Branch Shift left 2 ALUSrc 20 Instruction [15– 0] 10 Instruction [20– 16] 20 10 X D=0 Instruction [15– 11] Sign extend Zero ALU ALU result 0 M u x 1 Address Data memory Read data Write data T=$10 Clock 2 Read $1 data 1 Read register 2 Registers Read $X Write data 2 register Write data X 000 Read register 1 S=20 Instruction memory Instruction Address IR=sub$10,20($1) IR=lw $11, $2, $3 PC=4 PC MEM/WB 00 0 00 EX 0 B=X 4 1 M EX/MEM WB: before<3> MemtoReg 11 MEM: before<2> MemWrite IF/ID PC=4 PC=8 Add EX: before<1> A=C$1 0 M u x 1 ID: lw $10, 20($1) PC=4 IF: sub $11, $2, $3 ALU control 0 M u x 1 1 M u x 0 MemRead ALUOp RegDst Figure 6.31b EECS 322 April 10, 2000 ID/EX PC=8 PC=12 2 RegWrite 010 0 00 EX 1 Read $2 data 1 Instruction [20– 16] X X 11 10 0 WB 0 Branch ALUSrc Zero ALU ALU result 0 M u x 1 20 D=0 D=$11 Instruction [15– 11] Sign extend 00 ALU control 0 M u x 1 ALUOp RegDst Address Data memory Read data Write data 1 M u x 0 MemRead D=$10 X Add Add result $1 Write data MEM/WB 0 0 M 0 Shift left 2 Read register 2 Registers Read $3 Write data 2 register Instruction [15– 0] WB T=$10 T=$3 Figure 6.32a Read register 1 X 11 Clock 3 1100 M ALU=20+C$1 3 000 S=20 S=X IR=sub $12,$4,$5 IR=and $11, $2, $3 PC=8 Instruction memory Instruction 4 Address Control 11 PC=4+20<<2 sub WB EX/MEM WB: before<2> MemtoReg 10 MEM: before<1> MemWrite IF/ID Add PC EX: lw $10, . . . C 0 M u x 1 ID: sub $11, $2, $3 C C PC=8 B=X PC=4 A=C$2 A=C$1 B=C$3 IF: and $12, $4, $5 EECS 322 April 10, 2000 11 ID/EX and Control WB 000 1100 Instruction [20– 16] $3 Sign extend X X Instruction 11 0 1 0 0 WB 0 ALUSrc Zero ALU ALU result 0 M u x 1 ALU control 0 M u x 1 ALUOp RegDst Read data Address Data memory Write data MemRead 1 M u x 0 10 D=$10 11 MEM/WB Branch D=$10 D=$11 12 D=$11 D=$12 Instruction [15– 11] Add Add result WB: before<1> ALU X M T=$3 Clock 4 1 10 EX 0 Write data Instruction [15– 0] WB Shift left 2 Read $4 data 1 Read register 2 Registers Read $5 Write data 2 register X 12 000 ALU=20+C$1 ALU=C$2-C$3 5 10 S=X IR=and $12,$4,$5 IR=or $13,$6,$7 PC=20 Instruction memory 4 M EX/MEM PC=4+20<<2 PC=X 10 4 Address C IF/ID MEM: lw $10, . . . MDR=Mem[20+C$1] $2 EX: sub $11, . . . Add PC RegDst PC=4+20<<2 Read register 1 ID: and $12, $2, $3 PC=12 PC=16 0 M u x 1 PC=8 A=C$2 B=C$5 A=C$4 B=C$3 PC=12 IF: or $13, $6, $7 RegWrite Clock 4: Figure 6.32b 1 C C Clock 3 MemtoReg [15– 11] MemWrite 11 EECS 322 April 10, 2000 Data Dependencies: that can be resolved by forwarding Time (in clock cycles) CC 1 Value of register $2: 10 CC 2 CC 3 10 10 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9 10 10/– 20 forwarding – 20 – 20 Resolved by – 20 – 20 Program execution order (in instructions) sub $2, $1, $3 Data Dependencies and $12, $2, $5 IM Reg IM sw $15, 100($2) Figure 6.36 Data Hazards Reg DM Reg IM or $13, $6, $2 add $14, $2, $2 DM Reg DM Reg IM At same time: Not a hazard DM Reg IM Reg Reg Reg DM Reg Forward in time: Not a hazard EECS 322 April 10, 2000 Data Hazards: arithmetic Time (in clock cycles) CC 1 Value of register $2 : 10 Value of EX/MEM : X Value of MEM/WB : X CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9 10 X X 10 X X 10 – 20 X 10/– 20 X – 20 – 20 X X – 20 X X – 20 X X – 20 X X Forwards in time: Can be resolved Program execution order (in instructions) sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) Figure 6.37 At same time: Not a hazard IM Reg IM DM Reg IM Reg DM Reg IM Reg DM Reg IM Reg DM Reg Reg DM Reg EECS 322 April 10, 2000 Data Dependencies: no forwarding Clock 1 IF sub $2,$1,$3 and $12,$2,$5 2 3 ID IF EX ID Stall 5 6 7 8 WB M ID Stall 4 ID Write 1st Half EX M WB Read 2nd Half Suppose every instruction is dependant = 1 + 2 stalls = 3 clocks MIPS = Clock = 500 Mhz = 167 MIPS CPI 3 EECS 322 April 10, 2000 Data Dependencies: no forwarding A dependant instruction will take = 1 + 2 stalls = 3 clocks An independent instruction will take = 1 + 0 stalls = 1 clocks Suppose 10% of the time the instructions are dependant? Averge instruction time = 10%*3 + 90%*1 = 0.10*3 + 0.90*1 = 1.2 clocks MIPS = Clock = 500 Mhz = 417 MIPS (10% dependency) CPI 1.2 MIPS = Clock = 500 Mhz = 167 MIPS (100% dependency) CPI 3 MIPS = Clock = 500 Mhz = 500 MIPS (0% dependency) CPI 1 EECS 322 April 10, 2000 Data Dependencies: with forwarding Clock 1 IF sub $2,$1,$3 and $12,$2,$5 2 ID IF 3 EX ID 4 M EX 5 6 WB M WB Detected Data Hazard 1a ID/EX.$rs = EX/M.$rd Suppose every instruction is dependant = 1 + 0 stalls = 1 clock MIPS = Clock = 500 Mhz = 500 MIPS CPI 1 EECS 322 April 10, 2000 Data Dependencies: Hazard Conditions Data Hazard Condition occurs whenever a data source needs a previous unavailable result due to a data destination. Example sub $2, $1, $3 and $12, $2, $5 sub and $rd, $rs, $rt $rd, $rs, $rt Data Hazard Detection is always comparing a destination with a source. Destination EX/MEM.$rdest = MEM/WB.$rdest = Source { { ID/EX.$rs ID/EX.$rt ID/EX.$rs ID/EX.$rt Hazard Type 1a. 1b. 2a. 2b. EECS 322 April 10, 2000 Data Dependencies: Hazard Conditions 1a Data Hazard: sub $2, $1, $3 and $12, $2, $5 EX/MEM.$rd = ID/EX.$rs sub $rd, $rs, $rt and $rd, $rs, $rt 1b Data Hazard: sub $2, $1, $3 and $12, $1, $2 EX/MEM.$rd = ID/EX.$rt sub $rd, $rs, $rt and $rd, $rs, $rt 2a Data Hazard: sub $2, $1, $3 and $12, $1, $5 or $13, $2, $1 MEM/WB.$rd = ID/EX.$rs sub $rd, $rs, $rt sub $rd, $rs, $rt and $rd, $rs, $rt 2b Data Hazard: sub $2, $1, $3 and $12, $1, $5 or $13, $6, $2 MEM/WB.$rd = ID/EX.$rt sub $rd, $rs, $rt sub $rd, $rs, $rt and $rd, $rs, $rt EECS 322 April 10, 2000 Data Dependencies: Worst case Data Hazard: sub $2, $1, $3 sub $rd, $rs, $rt and $12, $2, $2 and $rd, $rs, $rt or and $rd, $rs, $rt $13, $2, $2 Data Hazard 1a: Data Hazard 1b: Data Hazard 2a: Data Hazard 2b: EX/MEM.$rd = ID/EX.$rs EX/MEM.$rd = ID/EX.$rt MEM/WB.$rd = ID/EX.$rs MEM/WB.$rd = ID/EX.$rt EECS 322 April 10, 2000 Data Dependencies: Hazard Conditions Hazard Type 1a. 1b. Source ID/EX.$rs ID/EX.$rt 2a. 2b. ID/EX.$rs ID/EX.$rt ID/EX Destination } = EX/MEM.$rdest } = MEM/WB.$rdest Pipeline Registers EX/MEM MEM/WB $rs $rt $rd $rd $rd EECS 322 April 10, 2000 ID/EX EX/MEM Registers MEM/WB ALU Data memory M u x a. No forwarding ID/EX EX/MEM MEM/WB M u x Registers ForwardA ALU M u x Rs Rt Rt Rd Data memory ForwardB M u x EX/MEM.RegisterRd Forwarding unit Figure 6.38 M u x MEM/WB.RegisterRd b. With forwarding EECS 322 April 10, 2000 Data Hazards: Loads Time (in clock cycles) Program CC 1 execution order (in instructions) lw $2, 20($1) IM and $4, $2, $5 or $8, $2, $6 CC 2 Backwards in time: Cannot be resolved CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9 Forwards in time: Can be resolved Reg IM DM Reg IM add $9, $4, $2 Reg DM Reg IM Reg DM Reg Reg DM Reg At same time: Not a hazard slt $1, $6, $7 IM Reg DM Reg Figure 6.44 EECS 322 April 10, 2000 Data Hazards: load stalling Program Time (in clock cycles) execution CC 1 CC 2 order (in instructions) lw $2, 20($1) IM and $4, $2, $5 CC 3 Reg IM or $8, $2, $6 Reg IM CC 4 CC 5 DM Reg Reg IM CC 6 CC 7 DM Reg Reg DM CC 8 CC 9 CC 10 Reg bubble add $9, $4, $2 IM DM Reg Reg Stall slt $1, $6, $7 Figure 6.45 IM Reg DM Reg EECS 322 April 10, 2000 Data Hazards: Hazard detection unit (page 490) Stall Condition Source IF/ID.$rs IF/ID.$rt Destination } = ID/EX.$rt ID/EX.MemRead=1 Stall Example lw $2, 20($1) and $4, $2, $5 lw $rt, addr($rs) and $rd, $rs, $rt No Stall Example: (only need to look at next instruction) lw $2, 20($1) lw $rt, addr($rs) and $4, $1, $5 and $rd, $rs, $rt or $8, $2, $6 or $rd, $rs, $rt EECS 322 April 10, 2000 Data Hazards: Hazard detection unit (page 490) No Stall Example: (only need to look at next instruction) lw $2, 20($1) lw $rt, addr($rs) and $4, $1, $5 and $rd, $rs, $rt or $8, $2, $6 or $rd, $rs, $rt Example load: assume half of the instructions are immediately followed by an instruction that uses it. What is the average number of clocks for the load? load instruction time: 50%*(1 clock) + 50%*(2 clocks)=1.5 EECS 322 April 10, 2000 Hazard Detection Unit: when to stall ID/EX.MemRead Hazard detection unit ID/EX IF/IDWrite WB Control 0 M u x PC Instruction memory Instruction PCWrite IF/ID EX/MEM M WB EX M MEM/WB WB M u x Registers ALU Data memory M u x M u x IF/ID.RegisterRs Figure 6.46 IF/ID.RegisterRt IF/ID.RegisterRt Rt IF/ID.RegisterRd Rd ID/EX.RegisterRt Rs Rt M u x EX/MEM.RegisterRd Forwarding unit MEM/WB.RegisterRd EECS 322 April 10, 2000 Data Dependency Units Forwarding Condition Source ID/EX.$rs ID/EX.$rt Destination } } = MEM/WB.$rd ID/EX.$rs ID/EX.$rt Stall Condition Source IF/ID.$rs IF/ID.$rt = EX/MEM.$rd Destination } = ID/EX.$rt ID/EX.MemRead=1 EECS 322 April 10, 2000 Data Dependency Units Pipeline Registers Stalling Comparisons IF/ID ID/EX $rs $rs $rt $rt $rd $rd Stall Condition Source IF/ID.$rs IF/ID.$rt Forwarding Comparisons EX/MEM $rd MEM/WB $rd Destination } = ID/EX.$rt ID/EX.MemRead=1 EECS 322 April 10, 2000 Branch Hazards: Soln #1, Stall until Decision made (fig. 6.4) @3C: @40: add beq $4, $5, $6 $1, $3, 7 Soln #1: Stall until Decision is made @44: @48: @4C: @50: Program execution Time order (in instructions) add $4, $5, $6 beq $1, $2, 40 and or add lw 2 Instruction fetch 2ns 4 Reg Instruction fetch lw $3, 300($0) 4 ns $12, $2, $5 $13, $6, $2 $14, $2, $2 $4, 50($7) 6 ALU Reg 8 Data access ALU Instruction fetch 10 14 12 16 Reg Data access Reg Reg ALU Data access Reg 2ns Stall Decision made in ID stage: do load EECS 322 April 10, 2000 Branch Hazards: Soln #2, Predict until Decision made Clock beq $1,$3,7 1 IF 2 ID 3 EX 4 M 5 6 7 8 WB Predict false branch and $12, $2, $5 IF ID EX M WB discard “and $12,$2,$5” instruction lw $4, 50($7) IF ID EX M WB Decision made in ID stage: discard & branch EECS 322 April 10, 2000 Branch Hazards: Soln #3, Delayed Decision Clock beq $1,$3,7 1 IF 2 ID 3 EX 4 M 5 6 7 8 WB Move instruction before branch add $4,$6,$6 IF ID EX M WB Do not need to discard instruction lw $4, 50($7) IF ID EX M WB Decision made in ID stage: branch EECS 322 April 10, 2000 Branch Hazards: Soln #3, Delayed Decision Clock beq $1,$3,7 and $12, $2, $5 1 IF 2 ID IF 3 EX ID 4 M EX 5 6 7 8 WB M WB Decision made in ID stage: do branch lw $4, 50($7) IF ID EX M WB EECS 322 April 10, 2000 Branch Hazards: Decision made in the ID stage (figure 6.4) Clock 1 2 IF ID nop No decision yet: insert a nop IF beq $1,$3,7 lw $4, 50($7) 3 EX ID 4 M EX 5 6 7 8 WB M WB Decision: do load IF ID EX M WB EECS 322 April 10, 2000 Branch Hazards: Soln #2, Predict until Decision made Program execution order (in instructions) Branch Decision made in MEM stage: Discard values when wrong prediction Time (in clock cycles) CC 1 CC 2 CC 3 CC 4 CC 5 DM Reg CC 6 CC 7 CC 8 CC 9 Predict false branch 40 beq $1, $3, 7 44 and $12, $2, $5 48 or $13, $6, $2 IM Reg IM Reg IM 52 add $14, $2, $2 72 lw $4, 50($7) Figure 6.50 DM Reg IM Reg DM Reg Same effect as 3 stalls IM Reg DM Reg Reg DM Reg EECS 322 April 10, 2000 Figure 6.51 IF.Flush Early branch comparison Hazard detection unit ID/EX M u x WB Control 0 M u x IF/ID 4 M WB EX M MEM/WB WB Shift left 2 Registers PC EX/MEM = M u x Instruction memory ALU M u x Data memory M u x Sign Flush: if wrong prediciton, add nops extend M u x Forwarding unit EECS 322 April 10, 2000 Performance load: assume half of the instructions are immediately followed by an instruction that uses it (i.e. data dependency) load instruction time = 50%*(1 clock) + 50%*(2 clocks)=1.5 Jump: assume that jumps always pay 1 full clock cycle delay (stall). Jump instruction time = 2 Branch: the branch delay of misprediction is 1 clock cycle that 25% of the branches are mispredicted. branch time = 75%*(1 clocks) + 25%*(2 clocks) = 1.25 EECS 322 April 10, 2000 Performance, page 504 Instruction SingleCycle Multi-Cycle Clocks Pipeline Cycles Instruction Mix loads 1 5 1.5 23% stores 1 4 1 13% arithmetic 1 4 1 43% branches 1 3 1.25 19% jumps 1 3 2 2% Clock speed 125 Mhz 8 ns 500 Mhz 2 ns 500 Mhz 2 ns CPI 1 4.02 1.18 = Cycles*Mix MIPS 125 MIPS 125 MIPS 424 MIPS = Clock/CPI EECS 322 April 10, 2000 Homework for Wednesday April 5, 2000 1. Convert the RISCEE 1 Architecture into a pipeline Architecture (like Figure 6.30) (showing the number data and control bits). 2. Build the control line table (like Figure 6.28) for the RISCEE3 pipeline architecture for RISCEE1 instructions: (addi, subi, load, store, beq, jmp, jal). 3. Homework 6.23, p534 4. Homework 6.24, p534 EECS 322 April 10, 2000