Major CPU Design Steps Datapath 1. Analyze instruction set operations using independent RTN ISA => RTN => datapath requirements. – This provides the the required datapath components and how they are connected to meet ISA requirements. 2. Select required datapath components, connections & establish clock methodology (e.g clock edge-triggered). + Determine number of cycles per instruction and operations in each cycle. 3. Assemble datapath meeting the requirements. Control 4. Identify and define the function of all control points or signals needed by the datapath. – Analyze implementation of each instruction to determine setting of control points that affects its operations and register transfer. For each cycle of the instruction 5. Design & assemble the control logic. – Hard-Wired: Finite-state machine implementation. – Microprogrammed. i.e using a control program 3rd Edition Chapter 5.5 – See Handout – Not in 4th Edition EECC550 - Shaaban #1 Lec # 5 Winter 2012 12-18-2012 Single Cycle MIPS Datapath: PCSrc Branch Zero PC+4 ALUop (2-bits) Zero Function Field 32 Branch Target imm16 16 1 MemtoReg = 32 Data In 32 Clk 32 0 Mux Clk Extender Clk MemWr Main ALU ALU 1 busW Mux PC Mux Adder Rs Rt 5 5 R[rs] busA Rw Ra Rb 32 32 32-bit R[rt] Registers busB 0 32 ALU Control RegWr 5 0 T = I x CPI x C Imm16 Rd Rt 0 1 Adder PC Ext imm16 Rd RegDst 00 4 Rt Instruction<31:0> <0:15> Rs <11:15> Adr <16:20> <21:25> Inst Memory CPI = 1, Long Clock Cycle WrEn Adr 1 Data Memory Jump Not Included (Includes ORI not in book version) ExtOp ALUSrc EECC550 - Shaaban #2 Lec # 5 Winter 2012 12-18-2012 Single Cycle MIPS Datapath Extended To Handle Jump with Control Unit Added 32 Instruction [25–0] 32 Jump address [31–0] Shift left 2 26 28 PC + 4 [31–28] 4 Add PC +4 32 PC +4 32 0 M u x PC +4 Add 4 ALU result Branch Target 1 1 32 M u x 0 Shift left 2 RegDst Jump Branch Opcode MemRead Instruction [31–26] MemtoReg Control ALUOp MemWrite ALUSrc RegWrite Instruction [25–21] PC Read address Instruction [20–16] Instruction [31–0] Instruction memory Instruction [15–11] Instruction [15–0] imm16 3rd rt Read register 1 Read data 1 Read register 2 Edition Figure 4.24 page 329 Edition Figure 5.24 page 314 1 M u x Read data 2 Write register Write data 16 R[rs] Zero 0 rd 4th rs ALU R[rt] 0 M u x ALU result Data memory 1 Registers Sign extend Address R[rt] Write data Read data 1 0 M u x 32 32 ALU control Function Field Instruction [5–0] In this book version, ORI is not supported—no zero extend of immediate needed. ALUOp (2-bits) 00 = add 01 = subtract 10 = R-Type EECC550 - Shaaban #3 Lec # 5 Winter 2012 12-18-2012 Drawbacks of Single-Cycle Processor 1. Long cycle time: CPI = 1 – All instructions must take as much time as the slowest: • Cycle time for load is longer than needed for all other instructions. – Real memory is not as well-behaved as idealized memory • Cannot always complete data access in one (short) cycle. 2. Impossible to implement complex, variable-length instructions and complex addressing modes in a single cycle. • e.g indirect memory addressing. 3. High and duplicate hardware resource requirements – Any hardware functional unit cannot be used more than once in a single cycle (e.g. ALUs). 4. Cannot pipeline (overlap) the processing of one instruction with the previous instructions. – (instruction pipelining, 4th edition chapter 4 – 3rd edition ch. 6). EECC550 - Shaaban #4 Lec # 5 Winter 2012 12-18-2012 Abstract View of Single Cycle CPU Main Control op Critical Path = C = 8ns (LW) ALU control 2 ns RegDst RegWr MemWr Result Store 2 ns Reg. Wrt MemRd MemWr Mem Access ExtOp ALUSrc ALUctr ALU 1 ns Data Mem 1 ns Ext Register Fetch Instruction Fetch PC Next PC Equal Branch, Jump fun 2 ns One CPU Clock Cycle Duration C = 8ns One instruction per cycle CPI = 1 Assuming the following datapath/control hardware components delays: Memory Units: 2 ns ALU and adders: 2 ns Register File: 1 ns Control Unit < 1 ns EECC550 - Shaaban #5 Lec # 5 Winter 2012 12-18-2012 Single Cycle Instruction Timing Arithmetic & Logical PC Inst Memory Load PC 2 ns Inst Memory Reg File mux 1 ns mux Reg File Critical Path Store PC Inst Memory Reg File Branch PC Inst Memory Reg File ALU mux setup 2 ns 2 ns ALU Data Mem 1 ns mux setup (Determines CPU clock cycle, C) mux cmp ALU Data Mem mux Critical Path: Load - LW (e.g C = 8 ns) EECC550 - Shaaban #6 Lec # 5 Winter 2012 12-18-2012 Clock Cycle Time & Critical Path One CPU Clock Cycle Duration C = 8ns here Clk . . . . . . . . . i.e longest delay . . . Critical Path LW in this case • Critical path: the slowest path between any two storage devices • Clock Cycle time is a function of the critical path, and must be greater than: – Clock-to-Q + Longest Delay Path through the Combination Logic + Setup + Clock Skew Assuming the following datapath/control hardware components delays: Memory Units: 2 ns ALU and adders: 2 ns Register File: 1 ns Control Unit < 1 ns EECC550 - Shaaban #7 Lec # 5 Winter 2012 12-18-2012 Reducing Cycle Time: Multi-Cycle Design • Cut combinational dependency graph by inserting registers / latches. • The same work is done in two or more shorter cycles, rather than one long cycle. storage element storage element Two shorter cycles One long cycle e.g CPI =1 Acyclic Combinational Logic Cycle 1 Acyclic Combinational Logic (A) e.g CPI =2 => Storage Element: Register or memory Cycle 2 storage element Place registers to: • Get a balanced clock cycle length • Save any results needed for the remaining cycles storage element Acyclic Combinational Logic (B) storage element EECC550 - Shaaban #8 Lec # 5 Winter 2012 12-18-2012 Basic MIPS Instruction Processing Steps Instruction Memory Instruction Fetch Next Obtain instruction from program storage Instruction Mem[PC] Update program counter to address PC PC + 4 Instruction of next instruction Instruction Determine instruction type Decode Obtain operands from registers Execute Compute result value or status } Done by Control Unit Result Store result in register/memory if needed Store (usually called Write Back). T = I x CPI x C Common steps for all instructions EECC550 - Shaaban #9 Lec # 5 Winter 2012 12-18-2012 Partitioning The Single Cycle Datapath Add registers between steps to break into cycles 1 Instruction Fetch Cycle (IF) Instruction Decode 2 Cycle (ID) Execution Cycle 3 (EX) Place registers to: • Get a balanced clock cycle length • Save any results needed for the remaining cycles Data Memory Access 4 Cycle (MEM) 5 Result Store MemWr RegDst RegWr Reg. File MemRd MemWr ALUctr ALUSrc Exec Data Mem Operand Fetch Instruction Fetch 2 ns ExtOp 1 ns C = 2 ns f = 500 MHz 1 ns 2 ns 2 ns Mem Access To Control Unit PC Next PC Branch, Jump Thus: Write back Cycle (WB) EECC550 - Shaaban #10 Lec # 5 Winter 2012 12-18-2012 A MemToReg MemRd MemWr ALUSrc ALUctr R Mem Acces s B M Instruction Decode (ID) 2 1ns Execution (EX) 2ns RegDst Reg. RegWr File Equal Reg File Write to Register Data Mem Instruction Fetch (IF) 2ns IR Instruction Fetch Read Registers Ext ALU ExtOp To Control Unit PC Branch, Jump Next PC 1 Example Multi-cycle Datapath Memory Write Back (MEM) (WB) 3 4 2ns 5 1ns All clock-edge triggered (not shown register write enable control lines) Registers added: IR: Instruction register A, B: Two registers to hold operands read from register file. i.e R[rs], R[rt] R: or ALUOut, holds the output of the main ALU ALU result M: or Memory data register (MDR) to hold data read from data memory CPU Clock Cycle Time: Worst cycle delay = C = 2ns Assuming the following datapath/control hardware components delays: Memory Units: 2 ns ALU and adders: 2 ns Register File: 1 ns Control Unit < 1 ns Thus Clock Rate: f = 1 / 2ns = 500 MHz (ignoring MUX, CLK-Q delays) EECC550 - Shaaban #11 Lec # 5 Winter 2012 12-18-2012 Operations (Dependant RTN) for Each Cycle R-Type Logic Immediate Load Store Branch IF Instruction Fetch IR Mem[PC] IR Mem[PC] IR Mem[PC] IR Mem[PC] IR Mem[PC] ID Instruction Decode A R[rs] A R[rs] A R[rs] B R[rt A R[rs] A B R[rt] B R[rt] B R[rt] B R[rt R[rs] Zero A - B If Zero = 1: EX Execution R A funct B R A OR ZeroExt[imm16] R A + SignEx(Im16) R A + SignEx(Im16) PC PC + 4 + (SignExt(imm16) x4) else (i.e Zero =0): PC PC + 4 MEM WB Memory M Mem[R] Mem[R] B PC PC + 4 Write Back M R[rd] R R[rt] R R[rt] PC PC + 4 PC PC + 4 PC PC + 4 Instruction Fetch (IF) & Instruction Decode cycles are common for all instructions EECC550 - Shaaban #12 Lec # 5 Winter 2012 12-18-2012 MIPS Multi-Cycle Datapath: Five Cycles of Load Cycle 1 Cycle 2 Load IF ID CPI = 5 Cycle 3 Cycle 4 Cycle 5 EX MEM WB 1- Instruction Fetch (IF): Fetch the instruction from instruction Memory. 2- Instruction Decode (ID): Operand Register Fetch and Instruction Decode. 3- Execute (EX): Calculate the effective memory address. 4- Memory (MEM): Read the data from the Data Memory. 5- Write Back (WB): Write the loaded data to the register file. Update PC. EECC550 - Shaaban #13 Lec # 5 Winter 2012 12-18-2012 Multi-cycle Datapath Instruction CPI • R-Type/Immediate: Require four cycles, CPI = 4 – IF, ID, EX, WB • Loads: Require five cycles, CPI = 5 – IF, ID, EX, MEM, WB • Stores: Require four cycles, CPI = 4 – IF, ID, EX, MEM • Branches/Jumps: Require three cycles, CPI = 3 – IF, ID, EX • Average or effective program CPI: 3 CPI 5 depending on program profile (instruction mix). C = 2 ns f = 500 MHz EECC550 - Shaaban #14 Lec # 5 Winter 2012 12-18-2012 Single Cycle Vs. Multi-Cycle CPU Clk 8ns (125 MHz) Cycle 1 Cycle 2 Single Cycle Implementation: 8 ns Load Store Waste 2ns (500 MHz) Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Clk Multiple Cycle Implementation: Load IF Store ID EX MEM WB IF R-type ID EX MEM IF 1 CPI 5 Single-Cycle CPU: CPI = 1 C = 8ns f = 125 MHz One million instructions take = I x CPI x C = 106 x 1 x 8x10-9 = 8 msec T = I x CPI x C Assuming the following datapath/control hardware components delays: Memory Units: 2 ns ALU and adders: 2 ns Register File: 1 ns Control Unit < 1 ns Multi-Cycle CPU: CPI = 3 to 5 C = 2ns f = 500 MHz One million instructions take from 106 x 3 x 2x10-9 = 6 msec to 106 x 5 x 2x10-9 = 10 msec depending on instruction mix used. EECC550 - Shaaban #15 Lec # 5 Winter 2012 12-18-2012 Control Unit Design: • • • Finite State Machine (FSM) Control Model State specifies control points (outputs) for Register Transfer. AKA Hardwired Control Control points (outputs) are assumed to depend only on the current state and not inputs (i.e. Moore finite state machine) Transfer (register/memory writes) and state transition occur upon exiting the state on the falling edge of the clock. inputs (opcode, conditions) Last State Next State Logic State X Control State Register Transfer Control Points State Transition Depends on Inputs e.g Flip-Flops Current state Current State Output Logic Next State outputs (control points) To datapath Moore Finite State Machine Vs. Mealy ? EECC550 - Shaaban #16 Lec # 5 Winter 2012 12-18-2012 Control Specification For Multi-cycle CPU Finite State Machine (FSM) - State Transition Diagram “instruction fetch” IR MEM[PC] (Start state) “decode / operand fetch” R A fun B ORi R A or ZX Memory Execute R-type R[rd] R PC PC + 4 R[rt] R PC PC + 4 To instruction fetch LW SW BEQ & Zero BEQ & ~Zero PC PC + 4 R A + SX R A + SX M MEM[R] MEM[R] B PC PC + 4 R[rt] M PC PC + 4 To instruction fetch PC PC + 4+ SX || 00 To instruction fetch 13 states: 4 State Flip-Flops needed Write-back A R[rs] B R[rt] EECC550 - Shaaban #17 Lec # 5 Winter 2012 12-18-2012 Traditional FSM Controller next state op cond state Outputs (to datapath) control points Next State Logic Output Logic State Transition Table Inputs 11 next State control points Equal 6 Opcode Current State 4 State op Outputs (Control points) To datapath datapath State State register (4 Flip-Flops) EECC550 - Shaaban #18 Lec # 5 Winter 2012 12-18-2012 Traditional FSM Controller datapath + state diagram => control • Translate RTN statements into control points. • Assign states. • Implement the controller. More on FSM controller implementation in Appendix C EECC550 - Shaaban #19 Lec # 5 Winter 2012 12-18-2012 Mapping RTNs To Control Points Examples & State Assignments IR MEM[PC] “instruction fetch” 0000 0 imem_rd, IRen A R[rs] B R[rt] Aen, Ben “decode / operand fetch” 1 0001 ALUfun, Sen R-type R A fun B 0100 6 8 BEQ & Zero SW BEQ & ~Zero 11 R A or ZX R A + SX 0110 1000 R A + SX M MEM[R] 1001 1011 R[rd] R PC PC + 4 R[rt] R PC PC + 4 0101 0111 To instruction fetch state 0000 0011 MEM[R] B PC PC + 4 PC PC + 4+SX || 00 0010 To instruction fetch state 0000 10 R[rt] M PC PC + 4 1010 To instruction fetch state 0000 2 3 1100 7 PC PC + 4 12 9 RegDst, RegWr, PCen 5 LW ORi 13 states: 4 State Flip-Flops needed Write-back Memory Execute 4 EECC550 - Shaaban #20 Lec # 5 Winter 2012 12-18-2012 Detailed Control Specification – (Partial) State Transition Table Current Op field Z Next IR ?????? BEQ BEQ R-type orI LW SW xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx 0001 1 0011 0010 0100 0110 1000 1011 0000 1 1 0000 1 0 0101 0000 1 0 0111 0000 1 0 1001 1010 0000 1 0 1100 0000 1 0 State IF ID BEQ R ORI LW SW 0000 0001 0001 0001 0001 0001 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 ? 0 1 x x x x x x x x x x x x x x x PC en sel Ops AB Exec Ex Sr ALU S Mem RWM Write-Back M-R Wr Dst 11 11 11 11 11 11 Can be combined in one state 0 1 fun 1 0 0 or 0 1 1 0 1 0 1 1 0 1 1 0 add 1 1 0 1 1 0 add 1 More on FSM controller implementation in Appendix C 0 1 EECC550 - Shaaban #21 Lec # 5 Winter 2012 12-18-2012 Alternative Multiple Cycle Datapath (In Textbook) • Minimizes Hardware: 1 memory, 1 ALU PCWr RegDst ALUSrcA 1 RegWr 32 PC 32 Din Dout 32 MemRd Ra Rb busA A Rd B busW busB 1 1 Mux 0 Imm 16 Extend 32 32 1 32 4 0 1 2 3 32 32 ALU Control << 2 ALUOp MemtoReg 3rd Edition Chapter 5.5 (see handout) – Not in 4th Edition Zero 32 Reg File Rw 0 ALU Out 32 Rt 0 5 Mux 1 5 32 Rt Mem Data Reg Ideal Memory Rs 32 ALU Address 0 PC Mux 0 Mux 32 Instruction Reg 32 32 PCSrc Mux PCWrCond Zero IorD MemWr IRWr ALUSrcB EECC550 - Shaaban #22 Lec # 5 Winter 2012 12-18-2012 Alternative Multiple Cycle Datapath (In Textbook) rs rt rd imm16 i.e MDR • Shared instruction/data memory unit • A single ALU shared among instructions • Shared units require additional or widened multiplexors • Temporary registers to hold data between clock cycles of the instruction: • Additional registers: Instruction Register (IR), Memory Data Register (MDR), A, B, ALUOut (Figure 5.27 page 322) EECC550 - Shaaban #23 Lec # 5 Winter 2012 12-18-2012 Alternative Multiple Cycle Datapath With Control Lines (Fig 5.28 In Textbook) 32 2 2 PC+ 4 PC 32 32 32 rs Branch Target rt rd 32 32 2 imm16 32 (ORI not supported, Jump supported) (Figure 5.28 page 323) EECC550 - Shaaban #24 Lec # 5 Winter 2012 12-18-2012 The Effect of The 1-bit Control Signals Signal Name Effect when deasserted (=0) Effect when asserted (=1) RegDst The register destination number for the write register comes from the rt field (instruction bits 20:16). RegWrite None The register destination number for the write register comes from the rd field (instruction bits 15:11). The register on the write register input is written with the value on the Write data input. ALUSrcA The first ALU operand is the PC The First ALU operand is register A (i.e R[rs]) MemRead None MemWrite None Content of memory specified by the address input are put on the memory data output. Memory contents specified by the address input is replaced by the value on the Write data input. MemtoReg The value fed to the register write data input comes from ALUOut register. The value fed to the register write data input comes from data memory register (MDR). IorD The PC is used to supply the address to the memory unit. The ALUOut register is used to supply the the address to the memory unit. IRWrite None The output of the memory is written into Instruction Register (IR) PCWrite None The PC is written; the source is controlled by PCSource PCWriteCond None i.e. Branch (Figure 5.29 page 324) The PC is written if the Zero output of the ALU is also active. EECC550 - Shaaban #25 Lec # 5 Winter 2012 12-18-2012 The Effect of The 2-bit Control Signals Signal Name Value (Binary) 00 The ALU performs an add operation 01 The ALU performs a subtract operation 10 The funct field of the instruction determines the ALU operation (R-Type) 00 The second input of the ALU comes from register B 01 The second input of the ALU is the constant 4 10 The second input of the ALU is the sign-extended 16-bit immediate (imm16) field of the instruction in IR The second input of the ALU is is the sign-extended 16-bit immediate field of IR shifted left 2 bits (for branches) ALUOp ALUSrcB 11 00 PCSource Effect (i.e R[rs]) Output of the ALU (PC+4) is sent to the PC for writing 01 The content of ALUOut (the branch target address) is sent to the PC for writing 10 The jump target address (IR[25:0] shifted left 2 bits and concatenated with PC+4[31:28] is sent to the PC for writing i.e jump address (Figure 5.29 page 324) EECC550 - Shaaban #26 Lec # 5 Winter 2012 12-18-2012 Operations (Dependant RTN) for Each Cycle R-Type IF ID EX Instruction Fetch Instruction Decode Execution IR Mem[PC] PC PC + 4 WB Store IR Mem[PC] PC PC + 4 IR Mem[PC] PC PC + 4 A R[rs] A R[rs] A B R[rt] B R[rt] B R[rt] ALUout PC + (SignExt(imm16) x4) ALUout PC + ALUout ALUout A funct B MEM Load (SignExt(imm16) x4) A + SignEx(Imm16) Branch IR Mem[PC] PC PC + 4 A R[rs] ALUout PC + (SignExt(imm16) x4) R[rs] A + SignEx(Imm16) IR Mem[PC] PC PC + 4 A R[rs] B R[rt] B R[rt] ALUout PC + ALUout PC + (SignExt(imm16) x4) Zero A - B ALUout Jump (SignExt(imm16) x4) PC Jump Address Zero: PC ALUout Memory MDR Mem[ALUout] Write Back R[rd] ALUout R[rt] Mem[ALUout] B MDR Instruction Fetch (IF) & Instruction Decode (ID) cycles are common for all instructions EECC550 - Shaaban #27 Lec # 5 Winter 2012 12-18-2012 High-Level View of Finite State Machine Control (Figure 5.32) 2-5 6-7 (Figure 5.33) • • • • (Figure 5.34) 0-1 8 9 (Figure 5.35) (Figure 5.36) First steps are independent of the instruction class Then a series of sequences that depend on the instruction opcode Then the control returns to fetch a new instruction. Each box above represents one or several state. (Figure 5.31 page 332) EECC550 - Shaaban #28 Lec # 5 Winter 2012 12-18-2012 FSM State Transition Diagram (From Book) IF A R[rs] ID B R[rt] ALUout PC + (Figure 5.38 page 339) (SignExt(imm16) x4) IR Mem[PC] PC PC + 4 ALUout A + SignEx(Imm16) PC Jump Address EX ALUout A func B Zero A -B Zero: PC ALUout MDR Mem[ALUout] WB MEM R[rd] ALUout Mem[ALUout] B Total 10 states R[rt] MDR WB EECC550 - Shaaban More on FSM controller implementation in Appendix C #29 Lec # 5 Winter 2012 12-18-2012 Instruction Fetch (IF) and Decode (ID) FSM States A R[rs] B R[rt] ALUout PC + (SignExt(imm16) x4) IF IR Mem[PC] PC PC + 4 (Figure 5.33) (Figure 5.32 page 333) (Figure 5.34) ID (Figure 5.35) (Figure 5.36) EECC550 - Shaaban #30 Lec # 5 Winter 2012 12-18-2012 Instruction Fetch (IF) Cycle (State 0) IR Mem[PC] PC PC + 4 MemRead = 1 ALUSrcA = 0 ALUSrcB = 01 ALUOp = 00 (add) IorD = 0 PCWrite = 1 IRWrite =1 PCSource = 00 32 00 1 2 2 1 0 1 01 1 PC 32 PC+ 4 0 32 32 rs Branch Target rt rd 32 32 2 imm16 00 Add 32 (ORI not supported, Jump supported) (Figure 5.28 page 323) EECC550 - Shaaban #31 Lec # 5 Winter 2012 12-18-2012 Instruction Decode (ID) Cycle (State 1) A R[rs] ALUSrcA = 0 B R[rt] ALUout PC + (SignExt(imm16) x4) ALUSrcB = 11 ALUOp = 00 (add) (Calculate branch target) 32 2 2 11 PC 32 PC+ 4 0 32 32 rs Branch Target rt rd 32 32 2 imm16 00 Add 32 (ORI not supported, Jump supported) (Figure 5.28 page 323) EECC550 - Shaaban #32 Lec # 5 Winter 2012 12-18-2012 Load/Store Instructions FSM States (From Instruction Decode) ALUout A + SignEx(Imm16) EX i.e Effective address calculation MDR Mem[ALUout] MEM R[rt] Mem[ALUout] B MDR WB (Figure 5.33 page 334) To Instruction Fetch (Figure 5.32) EECC550 - Shaaban #33 Lec # 5 Winter 2012 12-18-2012 Load/Store Execution (EX) Cycle (State 2) Effective address calculation ALUout A + SignEx(Imm16) ALUSrcA = 1 ALUOp = 00 (add) ALUSrcB = 10 32 2 2 10 PC 32 PC+ 4 1 32 32 rs Branch Target rt rd 32 32 2 imm16 00 Add 32 (ORI not supported, Jump supported) (Figure 5.28 page 323) EECC550 - Shaaban #34 Lec # 5 Winter 2012 12-18-2012 Load Memory (MEM) Cycle (State 3) MDR Mem[ALUout] MemRead = 1 IorD = 1 32 2 2 1 1 PC+ 4 PC 32 32 32 rs Branch Target rt rd 32 32 2 imm16 32 (ORI not supported, Jump supported) (Figure 5.28 page 323) EECC550 - Shaaban #35 Lec # 5 Winter 2012 12-18-2012 Load Write Back (WB) Cycle (State 4) R[rt] MDR RegWrite = 1 MemtoReg = 1 RegDst = 0 32 2 2 PC+ 4 1 PC 32 32 0 32 rs Branch Target rt rd 32 32 2 1 imm16 32 (ORI not supported, Jump supported) (Figure 5.28 page 323) EECC550 - Shaaban #36 Lec # 5 Winter 2012 12-18-2012 Store Memory (MEM) Cycle (State 5) Mem[ALUout] B MemWrite = 1 IorD = 1 32 2 2 1 1 PC+ 4 PC 32 32 32 rs Branch Target rt rd 32 32 2 imm16 32 (ORI not supported, Jump supported) (Figure 5.28 page 323) EECC550 - Shaaban #37 Lec # 5 Winter 2012 12-18-2012 (From Instruction Decode) R-Type Instructions FSM States EX ALUout A funct B WB R[rd] ALUout To State 0 (Instruction Fetch) (Figure 5.32) (Figure 5.34 page 335) EECC550 - Shaaban #38 Lec # 5 Winter 2012 12-18-2012 R-Type Execution (EX) Cycle (State 6) ALUout A funct B ALUSrcA = 1 ALUSrcB = 00 ALUOp = 10 (R-Type) 32 2 2 00 PC 32 PC+ 4 1 32 32 rs Branch Target rt rd 32 32 2 imm16 10 R-Type 32 (ORI not supported, Jump supported) (Figure 5.28 page 323) EECC550 - Shaaban #39 Lec # 5 Winter 2012 12-18-2012 R-Type Write Back (WB) Cycle (State 7) R[rd] ALUout RegWrite = 1 MemtoReg = 0 RegDst = 1 32 2 2 PC+ 4 1 PC 32 32 1 32 rs Branch Target rt rd 32 32 2 0 imm16 32 (ORI not supported, Jump supported) (Figure 5.28 page 323) EECC550 - Shaaban #40 Lec # 5 Winter 2012 12-18-2012 Branch Instruction Single EX State Jump Instruction Single EX State (From Instruction Decode) (From Instruction Decode) Zero A - B PC Jump Address Zero : PC ALUout EX EX To State 0 (Instruction Fetch) (Figure 5.32) (Figures 5.35, 5.36 page 337) To State 0 (Instruction Fetch) (Figure 5.32) EECC550 - Shaaban #41 Lec # 5 Winter 2012 12-18-2012 Branch Execution (EX) Cycle (State 8) Zero A - B Zero : PC ALUout ALUSrcA = 1 PCWriteCond = 1 ALUSrcB = 00 PCSource = 01 ALUOp = 01 (Subtract) 32 1 01 2 2 00 PC 32 PC+ 4 1 32 32 rs Branch Target rt rd 32 32 2 imm16 01 Subtract 32 (ORI not supported, Jump supported) (Figure 5.28 page 323) EECC550 - Shaaban #42 Lec # 5 Winter 2012 12-18-2012 Jump Execution (EX) Cycle (State 9) PC Jump Address PCWrite = 1 PCSource = 10 32 10 1 2 2 1 PC+ 4 PC 32 32 32 rs Branch Target rt rd 32 32 2 imm16 32 (ORI not supported, Jump supported) (Figure 5.28 page 323) EECC550 - Shaaban #43 Lec # 5 Winter 2012 12-18-2012 MIPS Multi-cycle Datapath Performance Evaluation 1 CPI 5 • What is the average CPI? – State diagram gives CPI for each instruction type. – Workload (program) below gives frequency of each type. Type CPIi for type Frequency CPIi x freqIi Arith/Logic 4 40% 1.6 Load 5 30% 1.5 Store 4 10% 0.4 branch 3 20% 0.6 Average CPI: 4.1 Better than CPI = 5 if all instructions took the same number of clock cycles (5). C = 2 ns f = 500 MHz T = I x CPI x C EECC550 - Shaaban #44 Lec # 5 Winter 2012 12-18-2012 Adding Support for swap to Multi Cycle Datapath • You are to add support for a new instruction, swap that exchanges the values of two registers to the MIPS multicycle datapath of Figure 5.28 on page 232 i.e. R[rt] R[rs] swap $rs, $rt R[rs] R[rt] • Swap used the R-Type format with: the value of field rs = the value of field rd • Add any necessary datapaths and control signals to the multicycle datapath. Find a solution that minimizes the number of clock cycles required for the new instruction without modifying the register file. Justify the need for the modifications, if any. i.e No additional register write ports • Show the necessary modifications to the multicycle control finite state machine of Figure 5.38 on page 339 when adding the swap instruction. For each new state added, provide the dependent RTN and active control signal values. EECC550 - Shaaban #45 Lec # 5 Winter 2012 12-18-2012 Adding swap Instruction Support to Multi Cycle Datapath Swap $rs, $rt R[rt] R[rs] We assume here rs = rd in instruction encoding op R[rs] R[rt] rs rt [31-26] [25-21] [20-16] rd [10-6] 2 2 PC+ 4 rs R[rs] rt Branch Target R[rt] rd 2 3 imm16 2 The outputs of A and B should be connected to the multiplexor controlled by MemtoReg if one of the two fields (rs and rd) contains the name of one of the registers being swapped. The other register is specified by rt. The MemtoReg control signal becomes two bits. EECC550 - Shaaban #46 Lec # 5 Winter 2012 12-18-2012 Adding swap Instruction Support to Multi Cycle Datapath IF A R[rs] IR Mem[PC] PC PC + 4 ID B R[rt] ALUout PC + (SignExt(imm16) x4) EX ALUout A + SignEx(Imm16) WB1 R[rd] B rd = rs ALUout A func B Zero A -B Zero: PC ALUout WB2 R[rt] A R[rd] ALUout A has R[rs] MEM WB Swap takes 4 cycles WB EECC550 - Shaaban #47 Lec # 5 Winter 2012 12-18-2012 Adding Support for add3 to Multi Cycle Datapath • You are to add support for a new instruction, add3, that adds the values of three registers, to the MIPS multicycle datapath of Figure 5.28 on page 232 For example: add3 $s0,$s1, $s2, $s3 Register $s0 gets the sum of $s1, $s2 and $s3. The instruction encoding uses a modified R-format, with an additional register specifier rx added replacing the five low bits of the “funct” field. 6 bits [31-26] • • 5 bits [25-21] 5 bits [20-16] 5 bits [15-11] OP rs rt rd add3 $s1 $s2 $s0 6 bits [10-5] 5 bits [4-0] rx Not used $s3 Add necessary datapath components, connections, and control signals to the multicycle datapath without modifying the register bank or adding additional ALUs. Find a solution that minimizes the number of clock cycles required for the new instruction. Justify the need for the modifications, if any. Show the necessary modifications to the multicycle control finite state machine of Figure 5.38 on page 339 when adding the add3 instruction. For each new state added, provide the dependent RTN and active control signal values. EECC550 - Shaaban #48 Lec # 5 Winter 2012 12-18-2012 add3 instruction support to Multi Cycle Datapath Add3 $rd, $rs, $rt, $rx rx is a new register specifier in field [0-4] of the instruction No additional register read ports or ALUs allowed R[rd] R[rs] + R[rt] + R[rx] Modified R-Format op rs rt [31-26] [25-21] [20-16] rd rx [10-6] [4-0] 2 WriteB ReadSrc 2 2 rs rt 2 PC+ 4 Branch Target rx rd imm16 1. ALUout is added as an extra input to first ALU operand MUX to use the previous ALU result as an input for the second addition. 2. A multiplexor should be added to select between rt and the new field rx containing register number of the 3rd operand (bits 4-0 for the instruction) for input for Read Register 2. This multiplexor will be controlled by a new one bit control signal called ReadSrc. 3. WriteB control line added to enable writing R[rx] to B EECC550 - Shaaban #49 Lec # 5 Winter 2012 12-18-2012 add3 instruction support to Multi Cycle Datapath IF A R[rs] IR Mem[PC] PC PC + 4 B R[rt] ID ALUout PC + (SignExt(imm16) x4) EX ALUout WriteB A + SignEx(Im16) EX1 ALUout A + B WriteB B R[rx] ALUout A func B Zero A -B Zero: PC ALUout EX2 ALUout ALUout + B R[rd] ALUout MEM WB Add3 takes 5 cycles WB EECC550 - Shaaban #50 Lec # 5 Winter 2012 12-18-2012