Computer Architecture Chapter 5 Fall 2005 Department of Computer Science Kent State University The Processor: Datapath & Control • Our implementation of the MIPS is simplified – memory-reference instructions: lw, sw – arithmetic-logical instructions: add, sub, and, or, slt – control flow instructions: beq, j • Generic implementation – use the program counter (PC) to supply the instruction address and fetch the instruction from memory (and update the PC) – decode the instruction (and read registers) – execute the instruction Fetch PC = PC+4 Exec Decode • All instructions (except j) use the ALU after reading the registers How? memory-reference? arithmetic? control flow? Abstract / Simplified View 4 Add Add Data PC Address Instruction Instruction memory Register # Registers Register # ALU Address Data memory Register # Data • Two types of functional units: – elements that operate on data values (combinational) – elements that contain state (sequential) More Implementation Details Fetching Instructions • Fetching instructions involves – reading the instruction from the Instruction Memory – updating the PC to hold the address of the next instruction Add 4 Instruction Memory PC Read Address Instruction – PC is updated every cycle, so it does not need an explicit write control signal – Instruction Memory is read every cycle, so it doesn’t need an explicit read control signal Fetch-Decode-Execute • In order to execute an instruction we must – Fetch the instruction from memory – Determine what the instruction is (decode) – Execute it • Fetch and decode are the same for all instructions • Execute depends on the type of instruction Instruction Formats op rs rt rd shamt funct 31:26 25:21 20:16 15:11 10:6 5:0 op rs rt immed 31:26 25:21 20:16 15:0 op addr 31:26 25:0 Decoding Instructions • Decoding instructions involves – sending the fetched instruction’s opcode and function field bits to the control unit Control Unit Read Addr 1 Instruction Register Read Read Addr 2 Data 1 File Write Addr Read Write Data Data 2 – reading two values from the Register File • Register File addresses are contained in the instruction Executing Load and Store • Load – – – – Fetch operand (base address) from register Compute effective address Read data from memory Write result back to register • Store – Fetch operands from registers – Compute effective address – Write data to memory Executing Arithmetic/Logic • Arithmetic/logic (add, sub, and, or, slt) – Fetch operands from registers – Perform operation – Write result back to register Executing Branch and Jump • Conditional branch (beq) – Fetch operands from registers – Compare operands – If equal add displacement to PC • Jump (j) – Write new value to PC ALU Instructions • Components – Register File – ALU • Operation – Use instruction fields to select registers – Read source registers and send them to ALU – Send ALU result to destination register Components for ALU Instrs 5 Register numbers 5 5 Data 3 Read register 1 Read register 2 Registers Write register Write data ALU control Read data 1 Data Zero ALU ALU result Read data 2 RegWrite a. Registers b. ALU ALU Datapath 3 Read register 1 Instruction Read register 2 Registers Write register Write data ALU operation Read data 1 Zero ALU ALU result Read data 2 RegWrite Memory Access • Components – – – – Register File ALU Data Memory Sign-Extension Unit • Operation – ALU adds base register and sign-extended immediate – Send ALU result to memory as the address – Read the value from memory into the destination register (lw) or write the value from the source register into memory (sw) Components for Mem Access MemWrite Address Write data Read data Data memory 16 Sign extend 32 MemRead a. Data memory unit b. Sign-extension unit Memory Access Datapath 3 Read register 1 Instruction Read register 2 Registers Write register Write data ALU operation MemWrite Read data 1 Zero ALU ALU result Address Read data 2 Write data RegWrite 16 Sign extend 32 Read data Data memory MemRead Branches • Components – – – – – Register File ALU Program Counter (PC) Adder Sign-Extension Unit • Operation – Send source register values to ALU for comparison – Adder computes branch target address – Control logic decides whether branch is taken or not Branch Datapath PC + 4 from instruction datapath Add Sum Branch target Shift left 2 Instruction 3 Read register 1 Read register 2 Registers Write register Write data Read data 1 ALU Zero Read data 2 RegWrite 16 ALU operation Sign extend 32 To branch control logic Putting It All Together PCSrc Add ALU Add result 4 Shift left 2 RegWrite Instruction [25– 21] PC Read address Instruction [31– 0] Instruction memory Instruction [20– 16] 1 M u Instruction [15– 11] x 0 RegDst Instruction [15– 0] Read register 1 Read register 2 Read data 1 MemWrite ALUSrc Read Write data 2 register Write Registers data 16 Sign extend 1 M u x 0 1 M u x 0 Zero ALU ALU result MemtoReg Address Write data 32 ALU control Instruction [5– 0] ALUOp Read data Data memory MemRead 1 M u x 0 Control Unit • Control unit takes an instruction as input and produces control signals as output • Types of control signals – Multiplexor selector signals – Write enables for state elements – Control signals for other blocks (ALU, etc.) • In a single-cycle datapath the control unit is simple, just look up instruction in a table Control Signals • RegDst: Selects either rd or rt as the destination register • RegWrite: The value on the write data port will be written into the register specified by the write register input when asserted • ALUOp: Selects ALU operation • ALUSrc: Selects the second ALU input to be either the second register output or the sign-extended immediate value Control Signals (cont'd) • PCSrc: Selects new PC as either PC + 4 or the output of the branch target adder – This signal is derived from the Branch control signal and the ALU's Zero output • MemRead/MemWrite: Causes data memory to perform a read/write operation when asserted • MemToReg: Selects either the ALU output or the data memory output as the data input into the register file ALU Control • In order to simplify design of the control unit we give the ALU its own control logic • The ALU control block takes a 2-bit input from the control unit (ALUOp) and the funct field from the instruction and produces the ALU control signals ALU Control Signals Instruction ALUOp lw funct Field ALU Function ALU Inputs 00 Add 0010 sw 00 Add 0010 beq 01 Subtract 0110 add 10 100000 Add 0010 sub 10 100010 Subtract 0110 and 10 100100 AND 0000 or 10 100101 OR 0001 slt 10 101010 Set on less than 0111 Operation of Control Unit ALU lw sw beq ALUOp 10 00 00 01 ALUSrc 0 1 1 0 Branch 0 0 0 1 MemRead 0 1 0 0 MemWrite 0 0 1 0 MemToReg 0 1 x x RegDst 1 0 x x RegWrite 1 1 0 0 Datapath with Control Unit 0 M u x ALU Add result Add 4 Instruction [31 26] Control Instruction [25 21] PC Read address Instruction memory Instruction [15 11] Shift left 2 RegDst Branch MemRead MemtoReg ALUOp MemWrite ALUSrc RegWrite PCSrc Read register 1 Instruction [20 16] Instruction [31– 0] 1 0 M u x 1 Read data 1 Read register 2 Registers Read Write data 2 register 0 M u x 1 Write data Zero ALU ALU result Address Write data Instruction [15 0] 16 Instruction [5 0] Sign extend 32 ALU control Read data Data memory 1 M u x 0 Jump Instructions • The unconditional branch instruction (j) computes its branch target differently from the conditional branch instruction (beq) • Branch target address is: – Top 4 bits of PC + 4 – 26-bit immediate value – Two zero bits Datapath with Jump Instruction [25– 0] 26 Shift left 2 Jump address [31– 0] 28 0 1 M u x M u x ALU Add result 1 0 Zero ALU ALU result Address PC+4 [31– 28] Add 4 Instruction [31– 26] Control Instruction [25– 21] PC Read address Instruction memory Read register 1 Instruction [20– 16] Instruction [31– 0] Instruction [15– 11] Shift left 2 RegDst Jump Branch MemRead MemtoReg ALUOp MemWrite ALUSrc RegWrite 0 M u x 1 Read data 1 Read register 2 Registers Read Write data 2 register 0 M u x 1 Write data Write data Instruction [15– 0] 16 Instruction [5– 0] Sign extend 32 ALU control Read data Data memory 1 M u x 0 Performance • The single-cycle datapath executes each instruction in just one cycle • CPI is 1.0, which is optimal • However, minimum clock cycle time is determined by slowest instruction • In practice the execution time can vary considerably between instructions making a single-cycle implementation a poor choice Using Multiple Cycles • A multi-cycle datapath splits instruction execution into multiple steps, where each step take one cycle • If an instruction doesn't need a step it skips it, so different instructions run for different numbers of cycles • Slow instructions don't slow down the entire processor • Control unit becomes more complicated • Hardware can be shared between steps Multicycle Datapath (1) Instruction register PC Address Data A Register # Instruction Memory or data Data Memory data register ALU Registers Register # B Register # ALUOut Multicycle Differences • A functional unit can be used more than once in the execution of an instruction, so long as those uses occur in different steps – Instruction memory and data memory are combined into a single unit – ALU takes over for the two separate adders • Additional registers are needed to save information between steps Multicycle Registers • Instruction register (IR): hold the instruction during its execution • Memory data register (MDR): hold the data read from memory for one cycle • A: hold source register for one cycle • B: hold source register for one cycle • ALUOut: hold ALU output for one cycle Multicycle Datapath (2) PC 0 M u x 1 Address Memory MemData Write data Instruction [25– 21] Read register 1 Instruction [20– 16] Read Read data 1 register 2 Registers Write Read register data 2 0 Instruction M [15– 0] Instruction u [15– 11] x Instruction 1 register Instruction [15– 0] Memory data register A B 4 Write data 0 M u x 1 16 Sign extend 0 M u x 1 32 Shift left 2 0 1M u 2 x 3 Zero ALU ALU result ALUOut Multicycle Datapath (3) IorD PC 0 M u x 1 MemRead MemWrite RegDst RegWrite Instruction [25– 21] Address Memory MemData Write data IRWrite Instruction register Instruction [15– 0] Memory data register 0 M u x 1 Read register 1 Read Read data 1 register 2 Registers Write Read register data 2 Instruction [20– 16] Instruction [15– 0] ALUSrcA 0 M Instruction u x [15– 11] 1 A B 4 Write data 0 M u x 1 16 Sign extend 32 Shift left 2 Zero ALU ALU result 0 1 M u 2 x 3 ALU control Instruction [5– 0] MemtoReg ALUSrcB ALUOp ALUOut New Control Signals • ALUSrcA: selects first ALU operand to be either the PC or the A register • ALUSrcB: selects second ALU operand from: B register, constant 4, sign-extended immediate, sign-extended and shifted immediate • MemtoReg: selects register file write data as coming from either ALUOut or MDR • IorD: selects the memory address as coming from either PC or ALUOut New Control Signals (cont'd) • IRWrite: If asserted the memory output is written to IR • PCSource: Selects the new value for the PC from: ALU, ALUOut, jump target address • PCWrite: If asserted the PC is written • PCWriteCond: If asserted and the zero output from the ALU is 1 then the PC is written Instruction Execution Steps • Instruction fetch • Instruction decode and register fetch • Execution, memory address computation, or branch completion • Memory access or R-type completion • Memory read completion Instruction Fetch • Fetch instruction from memory – IR ← Memory[PC] • Increment the PC – PC ← PC + 4 Instruction Decode • Fetch operands from register file – A ← Reg[IR[25:21]] – B ← Reg[IR[20:16]] • Compute branch target address – ALUOut ← PC + (sign-extend(IR[15:0]) << 2) Execute • Load/store: Compute memory address – ALUOut ← A + sign-extend(IR[15:0]) • R-type: Perform operation specified by instruction – ALUOut ← A op B • Branch: Compare registers and set PC if equal – if (A == B) PC ← ALUOut • Jump: Set PC to jump target address – PC ← {PC[31:28], (IR[25:0] << 2)} Memory Access • Load: Read memory word into MDR – MDR ← Memory[ALUOut] • Store: Write B into memory – Memory[ALUOut] ← B • R-type: Write result to destination register – Reg[IR[15:11]] ← ALUOut Memory Read Completion • Load: Write result to destination register – Reg[IR[20:16]] ← MDR Multicycle Datapath (4) PCWriteCond PCSource PCWrite ALUOp IorD Outputs ALUSrcB MemRead ALUSrcA MemWrite Control RegWrite MemtoReg Op RegDst IRWrite [5– 0] 0 M 26 Instruction [25– 0] PC 0 M u x 1 Shift left 2 Instruction [31-26] Address Memory MemData Write data Instruction [25– 21] Read register 1 Instruction [20– 16] Read Read register 2 data 1 Registers Write Read register data 2 Instruction [15– 0] Instruction register Instruction [15– 0] Memory data register 0 M Instruction u x [15– 11] 1 B 4 Write data 0 M u x 1 16 Sign extend 32 Instruction [5– 0] Shift left 2 Jump address [31-0] Zero ALU ALU result 0 1 M u 2 x 3 ALU control 1 u x 2 PC [31-28] 0 M u x 1 A 28 ALUOut State Machine • A state machine is a sequential logic device with: – Set of states – Next-state function which determines the next state from the current state and the inputs – Output function which determines the outputs from the current state and possibly the inputs • In a Moore machine the output depends only on the state; in a Mealy machine the output depends on the state and the inputs Control with a State Machine • The control unit for our multicycle datapath will be a state machine • The only input is the op field of the instruction; the outputs are the control signals • Each step may have multiple states if control signals depend on the instruction Fetch and Decode States Instruction decode/ Register fetch Instruction fetch 'LW Memory reference FSM (Figure 5.38) R-type FSM (Figure 5.39) EQ ') 'B p= (O ) ype t R = W ') ALUSrcA = 0 ALUSrcB = 11 ALUOp = 00 Branch FSM (Figure 5.40) (Op = 'JMP') = (Op 'S p= O ( ') or 1 p Start MemRead ALUSrcA = 0 IorD = 0 IRWrite ALUSrcB = 01 ALUOp = 00 PCWrite PCSource = 00 (O 0 Jump FSM (Figure 5.41) Load and Store States From state 1 (Op = 'LW') or (Op = 'SW') Memory address computation 2 (O p = 'S ') W (Op = 'LW') ALUSrcA = 1 ALUSrcB = 10 ALUOp = 00 Memory access Memory access 5 3 MemRead IorD = 1 MemWrite IorD = 1 Write-back step 4 RegWrite MemtoReg = 1 RegDst = 0 To state 0 (Figure 5.37) R-Type States From state 1 (Op = R-type) Execution 6 ALUSrcA = 1 ALUSrcB = 00 ALUOp = 10 R-type completion 7 RegDst = 1 RegWrite MemtoReg = 0 To state 0 (Figure 5.37) Branch State From state 1 (Op = 'BEQ') Branch completion 8 ALUSrcA = 1 ALUSrcB = 00 ALUOp = 01 PCWriteCond PCSource = 01 To state 0 (Figure 5.37) Jump State From state 1 (Op = 'J') Jump completion 9 PCWrite PCSource = 10 To state 0 (Figure 5.37) Complete State Machine Instruction decode/ register fetch Instruction fetch (Op 2 W = 'L (Op ') or = 'S 6 Branch completion p = 'S ') W 5 MemRead IorD = 1 Write-back step 4 RegDst = 0 RegWrite MemtoReg = 1 R-type completion 7 MemWrite IorD = 1 RegDst = 1 RegWrite MemtoReg = 0 Jump completion 9 ALUSrcA = 1 ALUSrcB = 00 ALUOp = 01 PCWriteCond PCSource = 01 (O 3 Memory access EQ 8 ALUSrcA =1 ALUSrcB = 00 ALUOp = 10 Memory access ') ) ype R-t = p 'B (O W ') Execution ALUSrcA = 1 ALUSrcB = 10 ALUOp = 00 (Op = 'LW') ALUSrcA = 0 ALUSrcB = 11 ALUOp = 00 (Op = 'J') Memory address computation 1 = Start MemRead ALUSrcA = 0 IorD = 0 IRWrite ALUSrcB = 01 ALUOp = 00 PCWrite PCSource = 00 (O p 0 PCWrite PCSource = 10 Single Cycle Datapath with Control Unit 0 Add Add Shift left 2 4 ALUOp 1 PCSrc Branch MemRead MemtoReg MemWrite Instr[31-26] Control Unit ALUSrc RegWrite RegDst Instruction Memory PC Read Address Instr[31-0] ovf Instr[25-21] Read Addr 1 Register Read Instr[20-16] Read Addr 2 Data 1 File 0 Write Addr Read 1 Instr[15 -11] Instr[15-0] Write Data zero ALU Data Memory Read Data 1 Write Data 0 0 Data 2 1 Sign 16 Extend Address 32 Instr[5-0] ALU control R-type Instruction Data/Control Flow 0 Add Add Shift left 2 4 ALUOp 1 PCSrc Branch MemRead MemtoReg MemWrite Instr[31-26] Control Unit ALUSrc RegWrite RegDst Instruction Memory PC Read Address Instr[31-0] ovf Instr[25-21] Read Addr 1 Register Read Instr[20-16] Read Addr 2 Data 1 File 0 Write Addr Read 1 Instr[15 -11] Instr[15-0] Write Data zero ALU Data Memory Read Data 1 Write Data 0 0 Data 2 1 Sign 16 Extend Address 32 Instr[5-0] ALU control Load Word Instruction Data/Control Flow 0 Add Add Shift left 2 4 ALUOp 1 PCSrc Branch MemRead MemtoReg MemWrite Instr[31-26] Control Unit ALUSrc RegWrite RegDst Instruction Memory PC Read Address Instr[31-0] ovf Instr[25-21] Read Addr 1 Register Read Instr[20-16] Read Addr 2 Data 1 File 0 Write Addr Read 1 Instr[15 -11] Instr[15-0] Write Data zero ALU Data Memory Read Data 1 Write Data 0 0 Data 2 1 Sign 16 Extend Address 32 Instr[5-0] ALU control Branch Instruction Data/Control Flow 0 Add Add Shift left 2 4 ALUOp 1 PCSrc Branch MemRead MemtoReg MemWrite Instr[31-26] Control Unit ALUSrc RegWrite RegDst Instruction Memory PC Read Address Instr[31-0] ovf Instr[25-21] Read Addr 1 Register Read Instr[20-16] Read Addr 2 Data 1 File 0 Write Addr Read 1 Instr[15 -11] Instr[15-0] Write Data zero ALU Data Memory Read Data 1 Write Data 0 0 Data 2 1 Sign 16 Extend Address 32 Instr[5-0] ALU control Executing R Format Operations R format operations (add, sub, slt, and, or) 31 R-type: op 25 rs 20 15 rt rd 5 0 shamt funct perform the (op and funct) operation on values in rs and rt store the result back into the Register File (into location rd) RegWrite Instruction Read Addr 1 Register Read Read Addr 2 Data 1 File Write Addr Read Write Data 10 ALU control ALU overflow zero Data 2 The Register File is not written every cycle (e.g. sw), so we need an explicit write control signal for the Register File CSE431 L05 Basic MIPS Architecture.60 Irwin, PSU, 2005 Executing Load and Store Operations Load and store operations involves compute memory address by adding the base register (read from the Register File during decode) to the 16-bit signed-extended offset field in the instruction store value (read from the Register File during decode) written to the Data Memory load value, read from the Data Memory, written to the Register File RegWrite Instruction ALU control overflow zero Read Addr 1 Register Read Read Addr 2 Data 1 File Write Addr Read Write Data 16 CSE431 L05 Basic MIPS Architecture.61 Address ALU Data Memory Read Data Write Data Data 2 Sign Extend MemWrite MemRead 32 Irwin, PSU, 2005 Executing Branch Operations Branch operations involves compare the operands read from the Register File during decode for equality (zero ALU output) compute the branch target address by adding the updated PC to the 16-bit signed-extended offset field in the instr Add 4 Add Shift left 2 Branch target address ALU control PC Instruction Read Addr 1 Register Read Read Addr 2 Data 1 File Write Addr Read Write Data CSE431 L05 Basic MIPS Architecture.62 16 zero (to branch control logic) ALU Data 2 Sign Extend 32 Irwin, PSU, 2005 Executing Jump Operations Jump operation involves replace the lower 28 bits of the PC with the lower 26 bits of the fetched instruction shifted left by 2 bits Add 4 4 Instruction Memory PC CSE431 L05 Basic MIPS Architecture.63 Read Address Shift left 2 Jump address 28 Instruction 26 Irwin, PSU, 2005 Creating a Single Datapath from the Parts Assemble the datapath segments and add control lines and multiplexors as needed Single cycle design – fetch, decode and execute each instructions in one clock cycle no datapath resource can be used more than once per instruction, so some must be duplicated (e.g., separate Instruction Memory and Data Memory, several adders) multiplexors needed at the input of shared elements with control lines to do the selection write signals to control writing to the Register File and Data Memory Cycle time is determined by length of the longest path CSE431 L05 Basic MIPS Architecture.64 Irwin, PSU, 2005 Fetch, R, and Memory Access Portions Add RegWrite ALUSrc ALU control 4 MemtoReg ovf zero Instruction Memory PC MemWrite Read Address Instruction Read Addr 1 Register Read Read Addr 2 Data 1 File Write Addr Read Write Data ALU Data Memory Read Data Write Data Data 2 Sign 16 Extend CSE431 L05 Basic MIPS Architecture.65 Address MemRead 32 Irwin, PSU, 2005 Adding the Control Selecting the operations to perform (ALU, Register File and Memory read/write) Controlling the flow of data (multiplexor inputs) 31 R-type: op 31 Observations op field always in bits 31-26 I-Type: op 31 25 rs 25 rs 20 15 rt rd 20 rt 10 5 shamt funct 15 0 address offset 25 0 addr of registers J-type: op target address to be read are always specified by the rs field (bits 25-21) and rt field (bits 20-16); for lw and sw rs is the base register addr. of register to be written is in one of two places – in rt (bits 20-16) for lw; in rd (bits 15-11) for R-type instructions offset for beq, lw, and sw always in bits 15-0 CSE431 L05 Basic MIPS Architecture.66 0 Irwin, PSU, 2005 Single Cycle Datapath with Control Unit 0 Add Add Shift left 2 4 ALUOp 1 PCSrc Branch MemRead MemtoReg MemWrite Instr[31-26] Control Unit ALUSrc RegWrite RegDst Instruction Memory PC Read Address Instr[31-0] ovf Instr[25-21] Read Addr 1 Register Read Instr[20-16] Read Addr 2 Data 1 File 0 Write Addr Read 1 Instr[15 -11] Instr[15-0] Write Data zero ALU Address Data Memory Read Data 1 Write Data 0 0 Data 2 1 Sign 16 Extend 32 ALU control Instr[5-0] CSE431 L05 Basic MIPS Architecture.67 Irwin, PSU, 2005 R-type Instruction Data/Control Flow 0 Add Add Shift left 2 4 ALUOp 1 PCSrc Branch MemRead MemtoReg MemWrite Instr[31-26] Control Unit ALUSrc RegWrite RegDst Instruction Memory PC Read Address Instr[31-0] ovf Instr[25-21] Read Addr 1 Register Read Instr[20-16] Read Addr 2 Data 1 File 0 Write Addr Read 1 Instr[15 -11] Instr[15-0] Write Data zero ALU Address Data Memory Read Data 1 Write Data 0 0 Data 2 1 Sign 16 Extend 32 ALU control Instr[5-0] CSE431 L05 Basic MIPS Architecture.68 Irwin, PSU, 2005 Load Word Instruction Data/Control Flow 0 Add Add Shift left 2 4 ALUOp 1 PCSrc Branch MemRead MemtoReg MemWrite Instr[31-26] Control Unit ALUSrc RegWrite RegDst Instruction Memory PC Read Address Instr[31-0] ovf Instr[25-21] Read Addr 1 Register Read Instr[20-16] Read Addr 2 Data 1 File 0 Write Addr Read 1 Instr[15 -11] Instr[15-0] Write Data zero ALU Address Data Memory Read Data 1 Write Data 0 0 Data 2 1 Sign 16 Extend 32 ALU control Instr[5-0] CSE431 L05 Basic MIPS Architecture.70 Irwin, PSU, 2005 Branch Instruction Data/Control Flow 0 Add Add Shift left 2 4 ALUOp 1 PCSrc Branch MemRead MemtoReg MemWrite Instr[31-26] Control Unit ALUSrc RegWrite RegDst Instruction Memory PC Read Address Instr[31-0] ovf Instr[25-21] Read Addr 1 Register Read Instr[20-16] Read Addr 2 Data 1 File 0 Write Addr Read 1 Instr[15 -11] Instr[15-0] Write Data zero ALU Address Data Memory Read Data 1 Write Data 0 0 Data 2 1 Sign 16 Extend 32 ALU control Instr[5-0] CSE431 L05 Basic MIPS Architecture.72 Irwin, PSU, 2005 Adding the Jump Operation Instr[25-0] Shift left 2 26 1 28 32 0 PC+4[31-28] 0 Add Jump ALUOp Add Shift left 2 4 1 PCSrc Branch MemRead MemtoReg MemWrite Instr[31-26] Control Unit ALUSrc RegWrite RegDst Instruction Memory PC Read Address Instr[31-0] ovf Instr[25-21] Read Addr 1 Register Read Instr[20-16] Read Addr 2 Data 1 File 0 Write Addr Read 1 Instr[15 -11] Instr[15-0] Write Data zero ALU Address Data Memory Read Data 1 Write Data 0 0 Data 2 1 Sign 16 Extend 32 ALU control Instr[5-0] CSE431 L05 Basic MIPS Architecture.73 Irwin, PSU, 2005 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently – the clock cycle must be timed to accommodate the slowest instruction especially problematic for more complex instructions like floating point multiply Cycle 1 Cycle 2 Clk lw sw Waste May be wasteful of area since some functional units (e.g., adders) must be duplicated since they can not be shared during a clock cycle but Is simple and easy to understand CSE431 L05 Basic MIPS Architecture.74 Irwin, PSU, 2005 Multicycle Datapath Approach Let an instruction take more than 1 clock cycle to complete Break up instructions into steps where each step takes a cycle while trying to - balance the amount of work to be done in each step - restrict each cycle to use only one major functional unit Not every instruction takes the same number of clock cycles In addition to faster clock rates, multicycle allows functional units that can be used more than once per instruction as long as they are used on different clock cycles, as a result only need one memory – but only one memory access per cycle need only one ALU/adder – but only one ALU operation per cycle CSE431 L05 Basic MIPS Architecture.75 Irwin, PSU, 2005 Multicycle Datapath Approach, con’t At the end of a cycle Write Data IR – Instruction Register A, B – regfile read data registers ALU ALUout A Read Addr 1 Register Read Read Addr 2Data 1 File Write Addr Read Data 2 Write Data B Memory Address Read Data (Instr. or Data) IR Store values needed in a later cycle by the current instruction in an internal register (not visible to the programmer). All (except IR) hold data only between a pair of adjacent clock cycles (no write control signal needed) PC MDR MDR – Memory Data Register ALUout – ALU output register Data used by subsequent instructions are stored in programmer visible registers (i.e., register file, PC, or memory) CSE431 L05 Basic MIPS Architecture.76 Irwin, PSU, 2005 The Multicycle Datapath with Control Signals Memory Address 1 1 Write Data 1 Read Data (Instr. or Data) 0 MDR Write Data Data 2 Shift left 2 28 2 0 1 0 1 zero ALU 4 0 Instr[15-0] Sign Extend 32 Instr[5-0] CSE431 L05 Basic MIPS Architecture.77 Shift left 2 Instr[25-0] Read Addr 1 Register Read Read Addr 2 Data 1 File Write Addr Read IR PC Instr[31-26] 0 PC[31-28] ALUout MemRead MemWrite MemtoReg IRWrite PCSource ALUOp Control ALUSrcB ALUSrcA RegWrite RegDst A IorD B PCWriteCond PCWrite 0 1 2 3 ALU control Irwin, PSU, 2005 Multicycle Control Unit Multicycle datapath control signals are not determined solely by the bits in the instruction e.g., op code bits tell what operation the ALU should be doing, but not what instruction cycle is to be done next Must use a finite state machine (FSM) for control control logic ... Inst Opcode CSE431 L05 Basic MIPS Architecture.78 ... a set of states (current state stored in State Register) next state function (determined by current state and the input) output function (determined by Combinational current state and the input) Datapath control points ... State Reg Next State Irwin, PSU, 2005 The Five Steps of the Load Instruction Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 lw IFetch Dec Exec Mem WB IFetch: Instruction Fetch and Update PC Dec: Instruction Decode, Register Read, Sign Extend Offset Exec: Execute R-type; Calculate Memory Address; Branch Comparison; Branch and Jump Completion Mem: Memory Read; Memory Write Completion; Rtype Completion (RegFile write) WB: Memory Read Completion (RegFile write) INSTRUCTIONS TAKE FROM 3 - 5 CYCLES! CSE431 L05 Basic MIPS Architecture.79 Irwin, PSU, 2005 Multicycle Advantages & Disadvantages Uses the clock cycle efficiently – the clock cycle is timed to accommodate the slowest instruction step Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9Cycle 10 Clk lw IFetch sw Dec Exec Mem WB IFetch R-type Dec Exec Mem IFetch Multicycle implementations allow functional units to be used more than once per instruction as long as they are used on different clock cycles but Requires additional internal state registers, more muxes, and more complicated (FSM) control CSE431 L05 Basic MIPS Architecture.80 Irwin, PSU, 2005 Single Cycle vs. Multiple Cycle Timing Single Cycle Implementation: Cycle 1 Cycle 2 Clk lw sw multicycle clock slower than 1/5th of single cycle clock due to state register overhead Multiple Cycle Implementation: Clk Waste Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9Cycle 10 lw IFetch sw Dec Exec CSE431 L05 Basic MIPS Architecture.81 Mem WB IFetch R-type Dec Exec Mem IFetch Irwin, PSU, 2005 Next Lecture and Reminders Next lecture MIPS pipelined datapath review - Reading assignment – PH, Chapter 6.1-6.3 Reminders HW2 due September 27th Evening midterm exam scheduled - Tuesday, October 18th , 20:15 to 22:15, Location 113 IST - You should have let me know by now if you have a conflict !! CSE431 L05 Basic MIPS Architecture.82 Irwin, PSU, 2005 MIPS Subset • Memory access instructions – lw, sw • Arithmetic and logic instructions – add, sub, and, or, slt • Branch instructions – beq, j Instruction Formats op rs rt rd shamt funct 31:26 25:21 20:16 15:11 10:6 5:0 op rs rt immed 31:26 25:21 20:16 15:0 op addr 31:26 25:0 Fetch-Decode-Execute • In order to execute an instruction we must – Fetch the instruction from memory – Determine what the instruction is (decode) – Execute it • Fetch and decode are the same for all instructions • Execute depends on the type of instruction Executing Load and Store • Load – – – – Fetch operand (base address) from register Compute effective address Read data from memory Write result back to register • Store – Fetch operands from registers – Compute effective address – Write data to memory Executing Arithmetic/Logic • Arithmetic/logic (add, sub, and, or, slt) – Fetch operands from registers – Perform operation – Write result back to register Executing Branch and Jump • Conditional branch (beq) – Fetch operands from registers – Compare operands – If equal add displacement to PC • Jump (j) – Write new value to PC Instruction Fetch • Components – Instruction Memory – Program Counter (PC) – Adder • Operation – Fetch the instruction whose address is in the PC – Increment the PC by 4 Components for Instr Fetch Instruction address PC Instruction Add Sum Instruction memory a. Instruction memory b. Program counter c. Adder Instruction Fetch Datapath Add 4 PC Read address Instruction Instruction memory ALU Instructions • Components – Register File – ALU • Operation – Use instruction fields to select registers – Read source registers and send them to ALU – Send ALU result to destination register Components for ALU Instrs 5 Register numbers 5 5 Data 3 Read register 1 Read register 2 Registers Write register Write data ALU control Read data 1 Data Zero ALU ALU result Read data 2 RegWrite a. Registers b. ALU ALU Datapath 3 Read register 1 Instruction Read register 2 Registers Write register Write data ALU operation Read data 1 Zero ALU ALU result Read data 2 RegWrite Memory Access • Components – – – – Register File ALU Data Memory Sign-Extension Unit • Operation – ALU adds base register and sign-extended immediate – Send ALU result to memory as the address – Read the value from memory into the destination register (lw) or write the value from the source register into memory (sw) Components for Mem Access MemWrite Address Write data Read data Data memory 16 Sign extend 32 MemRead a. Data memory unit b. Sign-extension unit Memory Access Datapath 3 Read register 1 Instruction Read register 2 Registers Write register Write data ALU operation MemWrite Read data 1 Zero ALU ALU result Address Read data 2 Write data RegWrite 16 Sign extend 32 Read data Data memory MemRead Branches • Components – – – – – Register File ALU Program Counter (PC) Adder Sign-Extension Unit • Operation – Send source register values to ALU for comparison – Adder computes branch target address – Control logic decides whether branch is taken or not Branch Datapath PC + 4 from instruction datapath Add Sum Branch target Shift left 2 Instruction 3 Read register 1 Read register 2 Registers Write register Write data Read data 1 ALU Zero Read data 2 RegWrite 16 ALU operation Sign extend 32 To branch control logic Putting It All Together PCSrc Add ALU Add result 4 Shift left 2 RegWrite Instruction [25– 21] PC Read address Instruction [31– 0] Instruction memory Instruction [20– 16] 1 M u Instruction [15– 11] x 0 RegDst Instruction [15– 0] Read register 1 Read register 2 Read data 1 MemWrite ALUSrc Read Write data 2 register Write Registers data 16 Sign extend 1 M u x 0 1 M u x 0 Zero ALU ALU result MemtoReg Address Write data 32 ALU control Instruction [5– 0] ALUOp Read data Data memory MemRead 1 M u x 0 Control Unit • Control unit takes an instruction as input and produces control signals as output • Types of control signals – Multiplexor selector signals – Write enables for state elements – Control signals for other blocks (ALU, etc.) • In a single-cycle datapath the control unit is simple, just look up instruction in a table Control Signals • RegDst: Selects either rd or rt as the destination register • RegWrite: The value on the write data port will be written into the register specified by the write register input when asserted • ALUOp: Selects ALU operation • ALUSrc: Selects the second ALU input to be either the second register output or the sign-extended immediate value Control Signals (cont'd) • PCSrc: Selects new PC as either PC + 4 or the output of the branch target adder – This signal is derived from the Branch control signal and the ALU's Zero output • MemRead/MemWrite: Causes data memory to perform a read/write operation when asserted • MemToReg: Selects either the ALU output or the data memory output as the data input into the register file ALU Control • In order to simplify design of the control unit we give the ALU its own control logic • The ALU control block takes a 2-bit input from the control unit (ALUOp) and the funct field from the instruction and produces the ALU control signals ALU Control Signals Instruction ALUOp lw funct Field ALU Function ALU Inputs 00 Add 0010 sw 00 Add 0010 beq 01 Subtract 0110 add 10 100000 Add 0010 sub 10 100010 Subtract 0110 and 10 100100 AND 0000 or 10 100101 OR 0001 slt 10 101010 Set on less than 0111 Operation of Control Unit ALU lw sw beq ALUOp 10 00 00 01 ALUSrc 0 1 1 0 Branch 0 0 0 1 MemRead 0 1 0 0 MemWrite 0 0 1 0 MemToReg 0 1 x x RegDst 1 0 x x RegWrite 1 1 0 0 Datapath with Control Unit 0 M u x ALU Add result Add 4 Instruction [31 26] Control Instruction [25 21] PC Read address Instruction memory Instruction [15 11] Shift left 2 RegDst Branch MemRead MemtoReg ALUOp MemWrite ALUSrc RegWrite PCSrc Read register 1 Instruction [20 16] Instruction [31– 0] 1 0 M u x 1 Read data 1 Read register 2 Registers Read Write data 2 register 0 M u x 1 Write data Zero ALU ALU result Address Write data Instruction [15 0] 16 Instruction [5 0] Sign extend 32 ALU control Read data Data memory 1 M u x 0 Jump Instructions • The unconditional branch instruction (j) computes its branch target differently from the conditional branch instruction (beq) • Branch target address is: – Top 4 bits of PC + 4 – 26-bit immediate value – Two zero bits Datapath with Jump Instruction [25– 0] 26 Shift left 2 Jump address [31– 0] 28 0 1 M u x M u x ALU Add result 1 0 Zero ALU ALU result Address PC+4 [31– 28] Add 4 Instruction [31– 26] Control Instruction [25– 21] PC Read address Instruction memory Read register 1 Instruction [20– 16] Instruction [31– 0] Instruction [15– 11] Shift left 2 RegDst Jump Branch MemRead MemtoReg ALUOp MemWrite ALUSrc RegWrite 0 M u x 1 Read data 1 Read register 2 Registers Read Write data 2 register 0 M u x 1 Write data Write data Instruction [15– 0] 16 Instruction [5– 0] Sign extend 32 ALU control Read data Data memory 1 M u x 0 Adding the Jump Operation Instr[25-0] Shift left 2 26 1 28 32 0 PC+4[31-28] 0 Add Jump ALUOp Add Shift left 2 4 1 PCSrc Branch MemRead MemtoReg MemWrite Instr[31-26] Control Unit ALUSrc RegWrite RegDst Instruction Memory PC Read Address Instr[31-0] ovf Instr[25-21] Read Addr 1 Register Read Instr[20-16] Read Addr 2 Data 1 File 0 Write Addr Read 1 Instr[15 -11] Instr[15-0] Write Data zero ALU Data Memory Read Data 1 Write Data 0 0 Data 2 1 Sign 16 Extend Address 32 Instr[5-0] ALU control Performance • The single-cycle datapath executes each instruction in just one cycle • CPI is 1.0, which is optimal • However, minimum clock cycle time is determined by slowest instruction • In practice the execution time can vary considerably between instructions making a single-cycle implementation a poor choice Using Multiple Cycles • A multi-cycle datapath splits instruction execution into multiple steps, where each step take one cycle • If an instruction doesn't need a step it skips it, so different instructions run for different numbers of cycles • Slow instructions don't slow down the entire processor • Control unit becomes more complicated • Hardware can be shared between steps Multicycle Datapath (1) Instruction register PC Address Data A Memory Data Register # Instruction or data Memory data register ALU Registers Register # B Register # ALUOut Multicycle Differences • A functional unit can be used more than once in the execution of an instruction, so long as those uses occur in different steps – Instruction memory and data memory are combined into a single unit – ALU takes over for the two separate adders • Additional registers are needed to save information between steps Multicycle Datapath (2) PC 0 M u x 1 Address Memory MemData Write data Instruction [25– 21] Read register 1 Instruction [20– 16] Read Read register 2 data 1 Registers Write Read register data 2 Instruction [15– 0] Instruction register Instruction [15– 0] Memory data register 0 M Instruction u x [15– 11] 1 A B 4 Write data 0 M u x 1 16 Sign extend 0 M u x 1 32 Shift left 2 0 1 M u 2 x 3 Zero ALU ALU result ALUOut Multicycle Datapath (3) IorD PC 0 M u x 1 MemRead MemWrite RegDst RegWrite Instruction [25– 21] Address Memory MemData Write data IRWrite Instruction register Instruction [15– 0] Memory data register 0 M u x 1 Read register 1 Read Read data 1 register 2 Registers Write Read register data 2 Instruction [20– 16] Instruction [15– 0] ALUSrcA 0 M Instruction u x [15– 11] 1 A B 4 Write data 0 M u x 1 16 Sign extend 32 Shift left 2 Zero ALU ALU result 0 1 M u 2 x 3 ALU control Instruction [5– 0] MemtoReg ALUSrcB ALUOp ALUOut Multicycle Datapath (4) PCWriteCond PCSource PCWrite ALUOp IorD Outputs ALUSrcB MemRead ALUSrcA MemWrite Control RegWrite MemtoReg Op RegDst IRWrite [5– 0] 0 M 26 Instruction [25– 0] PC 0 M u x 1 Shift left 2 Instruction [31-26] Address Memory MemData Write data Instruction [25– 21] Read register 1 Instruction [20– 16] Read Read register 2 data 1 Registers Write Read register data 2 Instruction [15– 0] Instruction register Instruction [15– 0] Memory data register 0 M Instruction u x [15– 11] 1 B 4 Write data 0 M u x 1 16 Sign extend 32 Instruction [5– 0] Shift left 2 Zero ALU ALU result 0 1 M u 2 x 3 ALU control 1 u x 2 PC [31-28] 0 M u x 1 A 28 Jump address [31-0] ALUOut Multicycle Registers • Instruction register (IR): hold the instruction during its execution • Memory data register (MDR): hold the data read from memory for one cycle • A: hold source register for one cycle • B: hold source register for one cycle • ALUOut: hold ALU output for one cycle New Control Signals • ALUSrcA: selects first ALU operand to be either the PC or the A register • ALUSrcB: selects second ALU operand from: B register, constant 4, sign-extended immediate, sign-extended and shifted immediate • MemtoReg: selects register file write data as coming from either ALUOut or MDR • IorD: selects the memory address as coming from either PC or ALUOut New Control Signals (cont'd) • IRWrite: If asserted the memory output is written to IR • PCSource: Selects the new value for the PC from: ALU, ALUOut, jump target address • PCWrite: If asserted the PC is written • PCWriteCond: If asserted and the zero output from the ALU is 1 then the PC is written Instruction Execution Steps • Instruction fetch • Instruction decode and register fetch • Execution, memory address computation, or branch completion • Memory access or R-type completion • Memory read completion Instruction Fetch • Fetch instruction from memory – IR ← Memory[PC] • Increment the PC – PC ← PC + 4 Instruction Decode • Fetch operands from register file – A ← Reg[IR[25:21]] – B ← Reg[IR[20:16]] • Compute branch target address – ALUOut ← PC + (sign-extend(IR[15:0]) << 2) Execute • Load/store: Compute memory address – ALUOut ← A + sign-extend(IR[15:0]) • R-type: Perform operation specified by instruction – ALUOut ← A op B • Branch: Compare registers and set PC if equal – if (A == B) PC ← ALUOut • Jump: Set PC to jump target address – PC ← {PC[31:28], (IR[25:0] << 2)} Memory Access • Load: Read memory word into MDR – MDR ← Memory[ALUOut] • Store: Write B into memory – Memory[ALUOut] ← B • R-type: Write result to destination register – Reg[IR[15:11]] ← ALUOut Memory Read Completion • Load: Write result to destination register – Reg[IR[20:16]] ← MDR State Machine • A state machine is a sequential logic device with: – Set of states – Next-state function which determines the next state from the current state and the inputs – Output function which determines the outputs from the current state and possibly the inputs • In a Moore machine the output depends only on the state; in a Mealy machine the output depends on the state and the inputs Control with a State Machine • The control unit for our multicycle datapath will be a state machine • The only input is the op field of the instruction; the outputs are the control signals • Each step may have multiple states if control signals depend on the instruction Fetch and Decode States Instruction decode/ Register fetch Instruction fetch 'LW Memory reference FSM (Figure 5.38) R-type FSM (Figure 5.39) EQ ') 'B p= (O ) ype t R = W ') ALUSrcA = 0 ALUSrcB = 11 ALUOp = 00 Branch FSM (Figure 5.40) (Op = 'JMP') = (Op 'S p= O ( ') or 1 p Start MemRead ALUSrcA = 0 IorD = 0 IRWrite ALUSrcB = 01 ALUOp = 00 PCWrite PCSource = 00 (O 0 Jump FSM (Figure 5.41) Load and Store States From state 1 (Op = 'LW') or (Op = 'SW') Memory address computation 2 (O p = 'S ') W (Op = 'LW') ALUSrcA = 1 ALUSrcB = 10 ALUOp = 00 Memory access 3 Memory access 5 MemRead IorD = 1 MemWrite IorD = 1 Write-back step 4 RegWrite MemtoReg = 1 RegDst = 0 To state 0 (Figure 5.37) R-Type States From state 1 (Op = R-type) Execution 6 ALUSrcA = 1 ALUSrcB = 00 ALUOp = 10 R-type completion 7 RegDst = 1 RegWrite MemtoReg = 0 To state 0 (Figure 5.37) Branch State From state 1 (Op = 'BEQ') Branch completion 8 ALUSrcA = 1 ALUSrcB = 00 ALUOp = 01 PCWriteCond PCSource = 01 To state 0 (Figure 5.37) Jump State From state 1 (Op = 'J') Jump completion 9 PCWrite PCSource = 10 To state 0 (Figure 5.37) Complete State Machine Instruction decode/ register fetch Instruction fetch (Op 2 W = 'L (Op ') or = 'S 6 Branch completion p = 'S ') W 5 MemRead IorD = 1 Write-back step 4 RegDst = 0 RegWrite MemtoReg = 1 R-type completion 7 MemWrite IorD = 1 RegDst = 1 RegWrite MemtoReg = 0 Jump completion 9 ALUSrcA = 1 ALUSrcB = 00 ALUOp = 01 PCWriteCond PCSource = 01 (O 3 Memory access EQ 8 ALUSrcA =1 ALUSrcB = 00 ALUOp = 10 Memory access ') ) ype R-t = p 'B (O W ') Execution ALUSrcA = 1 ALUSrcB = 10 ALUOp = 00 (Op = 'LW') ALUSrcA = 0 ALUSrcB = 11 ALUOp = 00 (Op = 'J') Memory address computation 1 = Start MemRead ALUSrcA = 0 IorD = 0 IRWrite ALUSrcB = 01 ALUOp = 00 PCWrite PCSource = 00 (O p 0 PCWrite PCSource = 10 Exceptions • An exception is an event that causes an unscheduled transfer of control • Also known as interrupts and traps • Typically an interrupt is caused externally while an exception or trap is caused internally • Arithmetic overflow is an example of an exception; an I/O device request is an example of an interrupt Handling Exceptions • When hardware detects an exception it transfers control to a software routine called an exception handler which is typically a part of an operating system • The hardware saves the value of the PC in the exception PC (EPC) register so it can return there after the exception is handled Determining the Cause • The hardware must tell the exception handler what the cause of the exception was • One way to do this is store a value into a special Cause register (MIPS) • Another way is to use vectored interrupts where control is transferred to a different address depending on the cause Exceptions to Implement • Undefined instruction occurs when the op field of an instruction indicates an undefined or unimplemented instruction • Arithmetic overflow occurs when the ALU indicates that overflow has occurred during an R-type instruction Adding Exceptions • The EPC register saves the old PC; it is written when the EPCWrite is asserted • The Cause register records the cause of the exception; it is written when CauseWrite is asserted • The IntCause signal indicates the cause of the exception • Control is always transferred to 0x80000180 Changes to the Datapath CauseWrite IntCause EPCWrite PCSource ALUOp PCWriteCond PCWrite IorD Outputs MemRead MemWrite ALUSrcB Control ALUSrcA MemtoReg IRWrite RegWrite Op [5– 0] RegDst 0 26 Instruction [25– 0] PC 0 M u x 1 Shift left 2 Instruction [31-26] Address Memory MemData Read register 1 Instruction [20– 16] Read Read register 2 data 1 Registers Write Read register data 2 Write data Instruction register Instruction [15– 0] Memory data register 0 M Instruction u x [15– 11] 1 B 4 Write data 0 M u x 1 Zero ALU ALU result Sign extend 32 Instruction [5– 0] Shift left 2 3 0 1M u 2 x 3 ALUOut 0 1 16 u x PC [31-28] 0 M u x 1 A 1M 2 CO 00 00 00 Instruction [25– 21] Instruction [15– 0] 28 Jump address [31-0] ALU control EPC 0 M u x 1 Cause Changes to the Control Unit Instruction decode/ Register fetch 1 Instruction fetch 0 MemRead ALUSrcA = 0 IorD = 0 IRWrite ALUSrcB = 01 ALUOp = 00 PCWrite PCSource = 00 (Op = 'J') Q ') 'B E = ALUSrcA = 1 ALUSrcB = 00 ALUOp = 01 PCWriteCond PCSource = 01 9 PCWrite PCSource = 10 (O p = W 'S ') (Op = 'LW') 8 Jump completion r) ALUSrcA = 1 ALUSrcB = 00 ALUOp = 10 e oth ALUSrcA = 1 ALUSrcB = 00 ALUOp = 00 ) y pe R-t = p (O Branch completion p= (O W') = 'S (Op r Execution o W') = 'L 6 (Op Memory address computation 2 ALUSrcA = 0 ALUSrcB = 11 ALUOp = 00 (O p Start Memory access 3 Memory access 5 MemRead IorD = 1 R-type completion 7 MemWrite IorD = 1 11 RegDst = 1 RegWrite MemtoReg = 0 Write-back step 4 RegWrite MemtoReg = 1 RegDst = 0 Overflow Overflow IntCause = 1 CauseWrite ALUSrcA = 0 ALUSrcB = 01 ALUOp = 01 EPCWrite PCWrite PCSource = 11 10 IntCause = 0 CauseWrite ALUSrcA = 0 ALUSrcB = 01 ALUOp = 01 EPCWrite PCWrite PCSource = 11 Microprogramming • An alternative to state machines for control is microprogramming • Each instruction corresponds to a sequence of microinstructions (a microprogram) • The opcode bits specify the starting address of the microprogram within the microcode ROM. • A microinstruction contains values for all of the control signals plus some sequencing control bits • Microprogramming makes it easier to change the control unit or to implement complex instructions Microprogramming (cont'd) Microcode storage Datapath control outputs Outputs Input 1 Microprogram counter Adder Address select logic Inputs from instruction register opcode field Sequencing control Multicycle Performance • The multicycle datapath has a much shorter clock cycle time than the single-cycle datapath • However, it also has a larger CPI • Is the multicycle datapath really faster? • Depends on the instruction mix • Can we still do better?