55:035 Computer Architecture and Organization Lecture 9 Outline Building a CPU Basic Components MIPS Instructions Basic 5 Steps for CPU Single-Cycle Design Multi-cycle Design Comparison of Single and Multi-cycle Designs 55:035 Computer Architecture and Organization 2 Overview Brief look Digital logic CPU Datapath MIPS Example 55:035 Computer Architecture and Organization 3 Digital Logic D-type Flip-flop Multiplexer A D Q Clock (edgetriggered) 0 F 1 B S (Select input) D-type Flip-flop with Enable D Q EN 0 D Q D EN (enable) 1 Clock (edgetriggered) Q Clock (edgetriggered) 55:035 Computer Architecture and Organization 4 Digital Logic 1 Bit 4 Bits D3 D2 D1 D0 D Q EN Clock (edgetriggered) Q3 Q2 Q1 Q0 N Bits D Q EN Clock (edgetriggered) EN Clock (edgetriggered) Registers 55:035 Computer Architecture and Organization 5 Digital Logic Tri-state Driver (Buffer) in out drive In Drive Out 0 0 Z 1 0 Z 0 1 0 1 1 1 What is Z ?? 55:035 Computer Architecture and Organization 6 Digital Logic Adder/Subtractor or ALU B A Add/sub or ALUop Carry-out Carry-in F 55:035 Computer Architecture and Organization 7 Overview Brief look Digital logic How to Design a CPU Datapath MIPS Example 55:035 Computer Architecture and Organization 8 Designing a CPU: 5 Steps Analyze the instruction set datapath requirements MIPS: ADD, SUB, ORI, LW, SW, BR Meaning of each instruction given by RTL (register transfers) 2 types of registers: CPU/ISA registers, temporary registers Datapath requirements select the datapath components ALU, register file, adder, data memory, etc Assemble the datapath Datapath must support planned register transfers Ensure all instructions are supported Analyze datapath control required for each instruction Assemble the control logic 55:035 Computer Architecture and Organization 9 Step 1a: Analyze ISA All MIPS instructions are 32 bits long. Three instruction formats: 31 R-type 26 op rs 6 bits I-type 31 26 J-type 31 6 0 rd shamt funct 5 bits 5 bits 6 bits 16 0 immediate rt 5 bits 16 bits 0 target address 6 bits 5 bits 11 26 op 5 bits 21 rs 6 bits 16 rt 5 bits op 21 26 bits R: registers, I: immediate, J: jumps These formats intentionally chosen to simplify design 55:035 Computer Architecture and Organization 10 Step 1b: Analyze ISA 31 Rtype 21 op 6 bits 31 I-type J-type 26 rs 5 bits 26 rt 5 bits 21 op 6 bits 31 16 rs 5 bits 11 rd 5 bits 6 shamt 5 bits 16 rt 5 bits 0 funct 6 bits 0 immediate 16 bits 26 0 op 6 bits target address 26 bits Meaning of the fields: op: operation of the instruction rs, rt, rd: the source and destination register specifiers Destination is either rd (R-type), or rt (I-type) shamt: shift amount funct: selects the variant of the operation in the “op” field immediate: address offset or immediate value target address: target address of the jump instruction 55:035 Computer Architecture and Organization 11 MIPS ISA: subset for today ADD and SUB addU rd, rs, rt subU rd, rs, rt OR Immediate: 31 op 31 ori rt, rs, imm16 lw rt, rs, imm16 sw rt, rs, imm16 op 31 0 rd shamt funct 5 bits 5 bits 6 bits 0 16 bits 16 rt 5 bits 6 immediate 5 bits 21 rs 11 16 rt 5 bits 26 6 bits BRANCH: 5 bits 21 rs 6 bits 16 rt 5 bits 26 op 21 rs 6 bits LOAD and STORE Word 26 0 immediate 5 bits 16 bits beq rs, rt, imm16 31 26 op 6 bits 21 rs 5 bits 16 rt 5 bits 55:035 Computer Architecture and Organization 0 immediate 16 bits 12 Step 2: Datapath Requirements REGISTER FILE MIPS ISA requires 32 registers, 32b each Called a register file Contains 32 entries Each entry is 32b AddU rd,rs,rt or SubU rd,rs,rt Register Numbers (5 bits ea) Read two sources rs, rt Operation rs + rt or rs – rt Write destination rd ← rs+/-rt RdReg1 RdData1 RdReg2 REGFILE WrReg RdData2 WrData How to implement? RegWrite Zero? Requirements Read two registers (rs, rt) Perform ALU operation Write a third register (rd) Result ALU ALUop 55:035 Computer Architecture and Organization 13 Step 3: Datapath Assembly ADDU rd, rs, rt SUBU rd, rs, rt Need an ALU Hook it up to REGISTER FILE REGFILE has 2 read ports (rs,rt), 1 write port (rd) Parameters rs Come From rt Instruction Fields rd Control Signals Depend Upon Instruction Fields RdReg1 RdData1 RdReg2 REGFILE WrReg RdData2 WrData RegWrite Zero? Result ALU ALUop Eg: ALUop = f(Instruction) = f(op, funct) 55:035 Computer Architecture and Organization 14 Steps 2 and 3: ORI Instruction ORI rt, rs, Imm16 Need new ALUop for ‘OR’ function, hook up to REGFILE 1 read port (rs), 1 write port (rt), 1 const value (Imm16) From Instruction rs RdReg1 rt RdReg2 REGFILE WrReg RdData2 WrData rt rd X Control Signals Depend Upon Instruction Fields RegWrite RdData1 ZEROImm16 16-bits EXTEND Zero? Result 0 ALU 1 ALUop ALUsrc E.g.: ALUsrc = f(Instruction) = f(op, funct) 55:035 Computer Architecture and Organization 15 Steps 2 and 3 Destination Register Must select proper destination, rd or rt Depends on Instruction Type R-type may write rd I-type may write rt rs RdReg1 rt RdReg2 REGFILE WrReg RdData2 WrData From Instruction rd 1 0 RegDst RegWrite RdData1 ZEROImm16 16-bits EXTEND 55:035 Computer Architecture and Organization Zero? Result 0 ALU 1 ALUop ALUsrc 16 Steps 2 and 3: Load Word LW rt, rs, Imm16 Need Data Memory: Addr is rs+Imm16, Imm16 is signed, use ALU for + Store in rt: rs rt RdReg1 1 rd data ← Mem[Addr] 0 RegDst rt ← Mem[rs+Imm16] RdData1 RdReg2 REGFILE WrReg RdData2 WrData Imm16 SIGN/ ZERORegWrite EXTEND Zero? Result 0 ALU 1 DATAMEM Addr RdData 0 1 ALUsrc ALUop MemtoReg ExtOp 17 55:035 Computer Architecture and Organization Steps 2 and 3: Store Word SW rt, rs, Imm16 Mem[Addr] ← data Need Data Memory: Addr is rs+Imm16, Imm16 is signed, use ALU for + Mem[rs+Imm16] ← rt Store in Mem: rs RdReg1 rt RdReg2 REGFILE WrReg RdData2 WrData 1 rd 0 RegDst RdData1 Zero? Result 0 Imm16 SIGN/ ZEROEXTEND RegWrite ALU 1 DATAMEM Addr RdData 1 WrData 0 ALUsrc ALUop MemWrite ExtOp 55:035 Computer Architecture and Organization MemtoReg 18 Writes: Need to Control Timing Problem: write to data memory Data can come anytime Addr must come first MemWrite must come after Addr Solution: use ideal data memory Else? writes to wrong Addr! Assume everything works ok How to fix this for real? One solution: synchronous memory Another solution: delay MemWr to come late Problems?: write to register file Does RegWrite signal come after WrReg number? When does the write to a register happen? Read from same register as being written? 55:035 Computer Architecture and Organization 19 Missing Pieces: Instruction Fetching Where does the Instruction come from? From instruction memory, of course! Recall: stored-program concept Alternatives? How about hard-coding wires and switches…? This is how ENIAC was programmed! How to branch? BEQ rs, rt, Imm16 55:035 Computer Architecture and Organization 20 Instruction Processing Fetch instruction Execute instruction Fetch next instruction Execute next instruction Fetch next instruction Execute next instruction Etc… How to maintain sequence? Use a counter! Branches (out of sequence) ? Load the counter! 55:035 Computer Architecture and Organization 21 Instruction Processing Program Counter Points to current instruction Address to instruction memory Instr ← InstrMem[PC] Next instruction: counts up by 4 Remember: memory is byte-addressable, instructions are 4 bytes PC ← PC + 4 Branch instruction: replace PC contents 55:035 Computer Architecture and Organization 22 Step 1: Analyze Instructions Register Transfer Language… op | rs | rt | rd | shamt | funct = InstrMem[ PC ] op | rs | rt | Imm16 = InstrMem[ PC ] Instr Register Transfers ADDU R[rd] ← R[rs] + R[rt]; PC ← PC + 4 SUBU R[rd] ← R[rs] – R[rt]; PC ← PC + 4 ORI R[rt] ← R[rs] + zero_ext(Imm16); PC ← PC + 4 LOAD R[rt] ← MEM[ R[rs] + sign_ext(Imm16)]; PC ← PC + 4 STORE MEM[ R[rs] + sign_ext(Imm16) ] ← R[rt]; PC ← PC + 4 BEQ if ( R[rs] == R[rt] ) then PC ← PC + 4 + { sign_ext(Imm16)] || b’00’ } else PC ← PC + 4 55:035 Computer Architecture and Organization 23 Steps 2 and 3: Datapath & Assembly Add 4 Read address PC Instruction [31:0] Instruction[31:0] Instruction Memory PC: a register Counter, counts by +4 Provides address to Instruction Memory 55:035 Computer Architecture and Organization 24 Steps 2 and 3: Datapath & Assembly Add Add 4 Shift Left 2 Add result 0 M u x 1 PCSrc Instruction[25:21] PC Read address Instruction[20:16] Instruction [31:0] Instruction Memory Instruction[15:11] PC: a register Instruction[15:0] (Imm16) 16 Note: the sign-extender for Imm16 is already in the datapath (everything else is new) Sign/ Zero Extend 32 Counter, counts by +4 Sometimes, must add SignExtend{Imm16||b’00’} for branch instructions ExtOp 25 Steps 2 and 3: Add Previous Datapath Add Add 4 RegWrite Instruction[25:21] PC Read address Instruction[20:16] Instruction [31:0] Instruction Memory Read reg. 1 Instruction[15:11] 0 M u x 1 RegDst Instruction[15:0] Read reg. 2 Write reg. Read data 2 Write Register data File (Imm16) 16 Instruction[5:0] Read data 1 (funct) Sign/ Zero Extend 32 Shift Left 2 Add result ALUSrc ALU Zero ALU 0 result M u x 1 0 M u x 1 PCSrc MemtoReg Address Write data Read data Data Memory ALU Control MemWrite ExtOp ALUOp 1 M u x 0 What have we done? Created a simple CPU datapath Control still missing (next slide) Single-cycle CPU Every instruction takes 1 clock cycle Clocking ? 55:035 Computer Architecture and Organization 27 One Clock Cycle Clock Locations PC, REGFILE have clocks Operation On rising edge, PC will get new value Maybe REGFILE will have one value updated as well After rising edge PC and REGFILE can’t change New value out of PC Instruction out of INSTRMEM Instruction selects registers to read from REGFILE Instruction controls ALUop, ALUsrc, MemWrite, ExtOp, etc ALU does its work DataMem may be read (depending on instruction) Result value goes back to REGFILE New PC value goes back to PC Await next clock edge 55:035 Computer Architecture and Organization Lots to do in only 1 clock cycle !! 28 Missing Steps? Control is missing (Steps 4 and 5 we mentioned earlier) Generate the green signals These are all f(Instruction), where f() is a logic expression Will look at control strategies in upcoming lecture Implementation Details How to implement REGFILE? ALUsrc, MemWrite, MemtoReg, PCSrc, RegDst, etc Read port: tristate buffers? Multiplexer? Memory? Two read ports: two of above? Write port: how to write only 1 register? How to control writes to memory? To register file? More instructions Shift instructions Jump instruction Etc 55:035 Computer Architecture and Organization 29 1-Cycle CPU Datapath Add Add 4 RegWrite Instruction[25:21] PC Read address Instruction[20:16] Instruction [31:0] Instruction Memory Read reg. 1 Instruction[15:11] 0 M u x 1 RegDst Instruction[15:0] Read reg. 2 Write reg. Read data 2 Write Register data File (Imm16) 16 Instruction[5:0] Read data 1 (funct) Sign/ Zero Extend 32 Shift Left 2 Add result ALUSrc ALU Zero ALU 0 result M u x 1 0 M u x 1 PCSrc MemtoReg Address Write data Read data Data Memory ALU Control MemWrite ExtOp ALUOp 1 M u x 0 1-cycle CPU Datapath + Control Add Add Add result 4 Instruction [31:26] Instruction[25:21] PC Read address Branch MemRead MemtoReg ALUOp MemWrite ALUSrc RegWrite Read reg. 1 Instruction[20:16] Read data 1 Read reg. 2 Instruction [31:0] Instruction Memory Control Instruction[15:11] PCSrc Shift Left 2 RegDst Write reg. Write data ALU Zero ALU result Read data 2 Register File Address Data Memory Write data Instruction[15:0] Sign/ Zero Extend Instruction[5:0] Read data ALU control 1-cycle CPU Control – Lookup Table Input or Output Inputs Outputs Signal Name R-format Lw Sw Beq Op5 0 1 1 0 Op4 0 0 0 0 Op3 0 0 1 0 Op2 0 0 0 1 Op1 0 1 1 0 Op0 0 1 1 0 RegDst 1 0 X X ALUSrc 0 1 1 0 MemtoReg 0 1 X X RegWrite 1 1 0 0 MemRead 0 1 0 0 MemWrite 0 0 1 0 Branch 0 0 0 1 ALUOp1 1 0 0 0 ALUOp0 0 0 0 1 Also: I-type instructions (ORI) & ExtOp (sign-extend control), etc. 1-cycle CPU + Jump Instruction Instruction[25:0] Jump address [31..0] PC + 4 [31..28] Instruction [31:26] Instruction[25:21] Instruction[20:16] Instruction[15:11] Instruction[15:0] Instruction[5:0] 1-cycle CPU Problems? Every instruction 1 cycle Some instructions “do more work” Eg, lw must read from DATAMEM All instructions must have same clock period… Many instructions run slower than necessary Tricky timing on MemWrite, RegWrite(?) signals Write signal must come *after* address is stable Need extra resources… PC+4 adder, ALU for BEQ instruction, DATAMEM+INSTRMEM 55:035 Computer Architecture and Organization 34 Performance! Single-Cycle CPU Performance Execute one instruction per clock cycle (CPI=1) Clock cycle time? Note dataflow includes: Not every instruction uses all resources (eg, DATAMEM read) Can we change clock period for each instruction? INSTRMEM read REGFILE access Sign extension ALU operation DATAMEM read REGFILE/PC write No! (Why not?) One clock period: the worst case! This is why a single-cycle CPU is not good for performance 55:035 Computer Architecture and Organization 35 1-cycle CPU Datapath + Controller Instruction[25:0] Jump address [31..0] PC + 4 [31..28] Instruction [31:26] Instruction[25:21] Instruction[20:16] Instruction[15:11] Instruction[15:0] Instruction[5:0] 1-cycle CPU Summary Operation 1 cycle per instruction Control signals held fixed during entire cycle (except BRANCH) Only 2 registers During clock cycle, data flows from register-outputs to register-inputs Fixed clock frequency / period Performance PC, updated every clock cycle REGFILE, updated when required 1 instruction per cycle Slowest instruction determines clock frequency Outstanding issue: MemWrite timing Assume this signal writes to memory at end of clock cycle 55:035 Computer Architecture and Organization 37 Multi-cycle CPU Goals Improve performance Break each instruction into smaller steps / multiple cycles Aim for 5x clock frequency Complex instructions (eg, LW) 5 cycles same performance as before Simple instructions (eg, ADD) fewer cycles faster Save resources (gates/transistors) LW instruction 5 cycles SW instruction 4 cycles R-type instruction 4 cycles Branch, Jump 3 cycles Re-use ALU over multiple cycles Put INSTR + DATA in same memory MemWrite timing solved? 55:035 Computer Architecture and Organization 38 Multi-cycle CPU Datapath PC M u x Address Memory MemData Write data Instruction [25:21] Instruction [20:16] Instruction [15:0] Instruction Register Instr[15:0] Memory Data Register RdReg1 RdData1 M u x A RdReg2 Instruction [15:11] M u x M u x Registers Write reg RdData2 Zero ALU result B 4 Write data Sign Extend ALU M u x Shift Left 2 Instruction[5:0] Add multiplexers + control signals (IorD, MemtoReg, ALUSrcA, ALUSrcB) Move signal paths (+4, Shift Left 2) ALU Out Multi-cycle CPU Datapath PC M u x Address Memory MemData Write data Instruction [25:21] Instruction [20:16] Instruction [15:0] Instruction Register Instr[15:0] Memory Data Register RdReg1 RdData1 M u x A RdReg2 Instruction [15:11] M u x M u x Registers Write reg RdData2 Zero ALU result B 4 Write data Sign Extend ALU M u x Shift Left 2 Instruction[5:0] Add registers + control signals (IR, MDR, A, B, ALUOut) Registers with no control signal load value every clock cycle (eg, PC) ALU Out Instruction Execution Example Execute a “Load Word” instruction LW rt, 0(rs) 5 Steps 1. 2. 3. 4. 5. Fetch instruction Read registers Compute address Read data Write registers 55:035 Computer Architecture and Organization 41 Load Word Instruction Sequence PC M u x Address Memory MemData Write data Instruction [25:21] Instruction [20:16] Instruction [15:0] Instruction Register RdReg1 RdData1 M u x A RdReg2 Instruction [15:11] Instr[15:0] Memory Data Register M u x M u x Registers Write reg Sign Extend Zero ALU result B 4 Write data Instruction[5:0] 1. Fetch Instruction InstructionRegister ← Mem[PC] RdData2 ALU Shift Left 2 M u x ALU Out Load Word Instruction Sequence PC M u x Address Memory MemData Write data Instruction [25:21] Instruction [20:16] Instruction [15:0] Instruction Register Instr[15:0] Memory Data Register RdReg1 RdData1 RdReg2 Instruction [15:11] M u x M u x Registers Write reg RdData2 Sign Extend ALU Zero ALU result B 4 Write data Instruction[5:0] 2. Read Registers A ← Registers[Rs] M u x A Shift Left 2 M u x ALU Out Load Word Instruction Sequence PC M u x Address Memory MemData Write data Instruction [25:21] Instruction [20:16] Instruction [15:0] Instruction Register Instr[15:0] Memory Data Register RdReg1 RdData1 M u x A RdReg2 Instruction [15:11] M u x M u x Registers Write reg RdData2 Instruction[5:0] 3. Compute Address ALUOut ← A + {SignExt(Imm16),b’00’} Zero ALU result B 4 Write data Sign Extend ALU Shift Left 2 M u x ALU Out Load Word Instruction Sequence PC M u x Address Memory MemData Write data Instruction [25:21] Instruction [20:16] Instruction [15:0] Instruction Register Instr[15:0] Memory Data Register RdReg1 RdData1 RdReg2 Instruction [15:11] M u x M u x Registers Write reg RdData2 Sign Extend ALU Zero ALU result B 4 Write data Instruction[5:0] 4. Read Data MDR ← Memory[ALUOut] M u x A Shift Left 2 M u x ALU Out Load Word Instruction Sequence PC M u x Address Memory MemData Write data Instruction [25:21] Instruction [20:16] Instruction [15:0] Instruction Register Instr[15:0] Memory Data Register RdReg1 RdData1 RdReg2 Instruction [15:11] M u x M u x Registers Write reg RdData2 Sign Extend ALU Zero ALU result B 4 Write data Instruction[5:0] 5. Write Registers Registers[Rt] ← MDR M u x A Shift Left 2 M u x ALU Out Load Word Instruction Sequence PC M u x Address Memory MemData Write data Instruction [25:21] Instruction [20:16] Instruction [15:0] Instruction Register Instr[15:0] Memory Data Register RdReg1 RdData1 RdReg2 Instruction [15:11] M u x M u x Registers Write reg RdData2 Sign Extend ALU Zero ALU result B 4 Write data Instruction[5:0] All 5 Steps Shown M u x A Shift Left 2 M u x ALU Out Multi-cycle Load Word: Recap 1. Fetch Instruction InstructionRegister ← Mem[PC] 2. Read Registers A ← Registers[Rs] 3. Compute Address ALUOut ← A + {SignExt(Imm16)} 4. Read Data MDR ← Memory[ALUOut] 5. Write Registers Registers[Rt] ← MDR Missing Steps? 55:035 Computer Architecture and Organization 48 Multi-cycle Load Word: Recap 1. Fetch Instruction InstructionRegister ← Mem[PC]; 2. Read Registers A ← Registers[Rs] 3. Compute Address ALUOut ← A + {SignExt(Imm16)} 4. Read Data MDR ← Memory[ALUOut] 5. Write Registers Registers[Rt] ← MDR PC ← PC + 4 Missing Steps? Must increment the PC Do it as part of the instruction fetch (in step 1) Need PCWrite control signal 55:035 Computer Architecture and Organization 49 Multi-cycle R-Type Instruction 1. Fetch Instruction InstructionRegister ← Mem[PC]; PC ← PC + 4 2. Read Registers A ← Registers[Rs]; B ← Registers[Rt] 3. Compute Value ALUOut ← A op B 4. Write Registers Registers[Rd] ← ALUOut RTL describes data flow action in each clock cycle Control signals determine precise data flow Each step implies unique control values 55:035 Computer Architecture and Organization 50 Multi-cycle R-Type Instruction: Control Signal Values 1. Fetch Instruction InstructionRegister ← Mem[PC]; PC ← PC + 4 MemRead=1, ALUSrcA=0, IorD=0, IRWrite, ALUSrcB=01, ALUop=00, PCWrite, PCSource=00 A ← Registers[Rs]; B ← Registers[Rt] ALUSrcA=0, ALUSrcB=11, ALUop=00 2. Read Registers 3. Compute Value ALUOut ← A op B ALUSrcA=1, ALUSrcB=00, ALUop=10 Registers[Rd] ← ALUOut RegDst=1, RegWrite, MemtoReg=0 4. Write Registers Each step implies unique control values Fixed for entire cycle “Default value” implied if unspecified 55:035 Computer Architecture and Organization 51 Check Your Work – Is RTL Valid ? 1. Datapath check Within one cycle… Each cycle has valid data flow path (path exists) Each register gets only one new value Across multiple cycles… Register value is defined before use in previous (earlier in time) clock cycle Eg, “A 3” must occur before “B A” Make sure register value doesn’t disappear if set >1 cycle earlier 2. Control signal check Each cycle, RTL describing the datapath flow implies a value for each control signal 0 or 1 or default or don’t care Each control signal gets only one fixed value the entire cycle 3. Overall check Does the sequence of steps work ? 55:035 Computer Architecture and Organization 52 Multi-cycle BEQ Instruction 1. Fetch Instruction InstructionRegister ← Mem[PC]; PC ← PC + 4 2. Read Registers, Precompute Target A ← Registers[Rs] ; B ← Registers[Rt] ; ALUOut ← PC + {SignExt{Imm16},b’00’} 3. Compare Registers, Conditional Branch if( (A – B) ==0 ) PC ← ALUOut Green shows PC calculation flow (in parallel with other operations) 55:035 Computer Architecture and Organization 53 Multi-cycle Datapath with Control Signals PCSrc PCWrite IRWrite IorD RegWrite ALUSrcA Jump address [31..0] MemRead Instr[25:0] RegDst PC[31..28] Instr[25:21] Instr[20:16] Instr[15:0] In[15:11] Instr[15:0] ALU Control MemWrite MemtoReg ALUSrcB Instruction[5:0] 55:035 Computer Architecture and Organization ALUOp 54 Multi-cycle Datapath with Controller Instr. [31:26] Jump address [31..0] Instr[25:0] Instr[31:26] PC[31..28] Instr[25:21] Instr[20:16] Instr[15:0] In[15:11] Instr[15:0] Instruction[5:0] Multi-cycle BEQ Instruction 1. Fetch Instruction InstructionRegister ← Mem[PC]; PC ← PC + 4 2. Read Registers, Precompute Target A ← Registers[Rs] ; B ← Registers[Rt] ; ALUOut ← PC + {SignExt{Imm16},b’00’} 3. Compare Registers, Conditional Branch if( (A – B) ==0 ) PC ← ALUOut Green shows PC calculation flow (in parallel with other operations) 55:035 Computer Architecture and Organization 56 Multi-cycle Datapath with Control Signals PCSrc PCWrite IRWrite IorD RegWrite ALUSrcA Jump address [31..0] MemRead Instr[25:0] RegDst PC[31..28] Instr[25:21] Instr[20:16] Instr[15:0] In[15:11] Instr[15:0] ALU Control MemWrite MemtoReg ALUSrcB Instruction[5:0] 55:035 Computer Architecture and Organization ALUOp 57 Multi-cycle Datapath with Controller Instr. [31:26] Jump address [31..0] Instr[25:0] Instr[31:26] PC[31..28] Instr[25:21] Instr[20:16] Instr[15:0] In[15:11] Instr[15:0] Instruction[5:0] Multi-cycle CPU Control: Overview Control Signal Outputs Control Signal Outputs General approach: Finite State Machine (FSM) Need details in each branch of control… Precise outputs for each state (Mealy depends on inputs, Moore does not) Precise “next state” for each state (can depend on inputs) 55:035 Computer Architecture and Organization 59 How to Implement FSM ? Manually with logic gates + FFs High-level language description (eg, Verilog, VHDL) Bubble diagram, next-state table, state assignment Karnaugh map for each state bit, each output bit (painful!) Describe FSM bubble diagram (next-states, output values) Automatically synthesized into gates + FFs Microcode (µ-code) description Sequence through many µ-ops for each CPU instruction Acts like a mini-CPU within a CPU One µ-op (µ-instruction) sends correct control signal for 1 cycle µ-op similar to one bubble in FSM µPC: microcode program counter Microcode storage memory contains µ-ops Can look similar to RTL or some new “assembly language” 55:035 Computer Architecture and Organization 60 FSM Specification: Bubble Diagram Can build this by examining RTL It is possible to automatically convert RTL into this form ! 61 FSM: Gates + FFs Implementation FSM High-level Organization 55:035 Computer Architecture and Organization 62 FSM: Microcode Implementation Microcode Storage (memory) Datapath control outputs Outputs Inputs 1 Microprogram Counter Sequencing control Adder Address Select Logic Inputs from instruction register opcode field 55:035 Computer Architecture and Organization 63 Multi-cycle CPU with Control FSM Conditional Branch FSM Control Outputs Instr. [31:26] Jump address [31..0] Instr[25:0] Instr[31:26] PC[31..28] Instr[25:21] Instr[20:16] Instr[15:0] In[15:11] Instr[15:0] Instruction[5:0] Control FSM: Overview General approach: Finite State Machine (FSM) Need details in each branch of control… 55:035 Computer Architecture and Organization 65 Detailed FSM 66 Detailed FSM Instruction Fetch R-Type Branch Memory Reference Jump 67 Detailed FSM: Instruction Fetch 55:035 Computer Architecture and Organization 68 Detailed FSM: Memory Reference LW SW 69 Detailed FSM: R-Type Instruction 55:035 Computer Architecture and Organization 70 Detailed FSM: Branch Instruction 55:035 Computer Architecture and Organization 71 Detailed FSM: Jump Instruction 55:035 Computer Architecture and Organization 72 Performance Comparison Single-cycle CPU vs Multi-cycle CPU 55:035 Computer Architecture and Organization 73 Simple Comparison 1 clock cycle Single-cycle CPU All 5 clock cycles Multi-cycle CPU LW 4 clock cycles Multi-cycle CPU 3 clock cycles Multi-cycle CPU SW, R-type BEQ, J What’s really happening? Single-cycle CPU Ideally: Fetch Calc Memory Addr ( Load Word Instruction ) Decode Write Multi-cycle CPU 55:035 Computer Architecture and Organization 75 In practice, steps differ in speeds… Load Word Instruction Fetch Single-cycle CPU Calc Decode Memory Addr Wasted time! Fetch Write Violation! Multi-cycle CPU Calc Decode Memory Addr Write 55:035 Computer Architecture and Organization 76 Single-cycle vs Multi-cycle LW instruction faster for single-cycle Single-cycle CPU Calc Fetch Decode Memory Addr Write Now wasted time is larger! Violation fixed! Fetch Multi-cycle CPU Calc Decode Memory Addr 55:035 Computer Architecture and Organization Write 77 Single-cycle vs Multi-cycle SW instruction ~ same speed Single-cycle CPU Fetch Decode Calc Addr Memory Speed diff Wasted time! Multi-cycle CPU Fetch Decode Calc Addr 55:035 Computer Architecture and Organization Memory 78 Single-cycle vs Multi-cycle BEQ, J instruction faster for multi-cycle Fetch Single-cycle CPU Calc Decode Addr Speed diff Wasted time! Multi-cycle CPU Fetch Decode Calc Addr 55:035 Computer Architecture and Organization 79 Performance Summary Which CPU implementation is faster? LW single-cycle is faster SW,R-type about the same BEQ,J multi-cycle is faster Real programs use a mix of these instructions Overall performance depends instruction frequency ! 55:035 Computer Architecture and Organization 80 Implementation Summary Single-cycle CPU 1 instruction per cycle (eg, 1MHz 1 MIPS) No “wasted time” on most complex instruction Large wasted time on simpler instructions Simple controller (just a lookup table or memory) Simple instructions Multi-cycle CPU << 1 instruction per cycle (eg, 1MHz 0.2 MIPS) Small time wasted on most complex instruction Small time wasted on simple instructions Hence, this instruction always slower than single-cycle CPU Eliminates “large wasted time” by using fewer clock cycles Complex controller (FSM) Potential to create complex instructions 55:035 Computer Architecture and Organization 81