ISA Requirements CPU Organization (Design) • Datapath Design: Components & their connections needed by ISA instructions – Capabilities & performance characteristics of principal Functional Units (FUs) needed by ISA instructions – (e.g., Registers, ALU, Shifters, Logic Units, ...) Components – Ways in which these components are interconnected (buses connections, multiplexors, etc.). Connections – How information flows between components. • Control Unit Design: Control/sequencing of operations of datapath components to realize ISA instructions – Logic and means by which such information flow is controlled. – Control and coordination of FUs operation to realize the targeted Instruction Set Architecture to be implemented (can either be implemented using a finite state machine or a microprogram). • Hardware description with a suitable language, possibly using Register Transfer Notation (RTN). 4th Edition Chapter 4.1-4.4 - 3rd Edition Chapter 5.1-5.4 EECC550 - Shaaban #1 Lec # 4 Winter 2012 12-11-2012 Major CPU Design Steps 1 Analyze instruction set to get datapath requirements: – Using independent RTN, write the micro-operations required for target ISA instructions. 1 2 • This provides the the required datapath components and how they are connected. 2 Select set of datapath components and establish clocking methodology (defines when storage or state elements can read and when they can be written, e.g clock edge-triggered) e.g Flip-Flops 3 Assemble datapath meeting the requirements. 4 Identify and define the function of all control lines, points or signals needed by the datapath. – Analyze implementation of each instruction to determine setting of control points that affects its operations. 5 Control unit design, based on micro-operation timing and control signals identified: – Combinational logic: For single cycle CPU. e.g Any instruction completed in one cycle i.e CPI = 1 – Hard-Wired: Finite-state machine implementation. – Microprogrammed. ISA Requirements CPU Design EECC550 - Shaaban #2 Lec # 4 Winter 2012 12-11-2012 CPU Design & Implantation Process • Top-down Design: – Specify component behavior from high-level requirements (ISA). • Bottom-up Design: – Assemble components in target technology to establish critical timing (hardware delays, critical path timing). • Iterative refinement: – Establish a partial solution, expand and improve. Instruction Set Architecture (ISA): Provides Requirements Reg. File Mux Processor Datapath ALU Target VLSI implementation Technology Reg Cells ISA Requirements CPU Design Control Mem Decoder Sequencer Gates EECC550 - Shaaban #3 Lec # 4 Winter 2012 12-11-2012 Datapath Design Steps • Write the micro-operation sequences required for a number of representative target ISA instructions using independent RTN. • Independent RTN statements specify: the required datapath components and how they are connected. 1 2 • From the above, create an initial datapath by determining possible destinations for each data source (i.e registers, ALU). – This establishes connectivity requirements (data paths, or connections) for datapath components. – Whenever multiple sources are connected to a single input, a multiplexor of appropriate size is added. (or destination) • Find the worst-time propagation delay (critical path) in the datapath to determine the datapath clock cycle (CPU clock cycle, C). • Complete the micro-operation sequences for all remaining instructions adding datapath components + connections/multiplexors as needed. EECC550 - Shaaban #4 Lec # 4 Winter 2012 12-11-2012 MIPS Instruction Formats 31 R-Type I-Type: ALU 26 op • • • • • • rt 11 6 0 rd shamt funct 5 bits 5 bits 5 bits 5 bits 6 bits [31:26] [25:21] [20:16] [15:11] [10:6] [5:0] 26 op 21 rs 16 0 Immediate (imm16) rt 6 bits 5 bits 5 bits 16 bits [31:26] [25:21] [20:16] [15:0] 31 J-Type: Jumps rs 16 6 bits 31 Load/Store, Branch 21 26 op Or address offset 0 target address 6 bits 26 bits [31:26] [25:0] op: Opcode, operation of the instruction. rs, rt, rd: The source and destination register specifiers. shamt: Shift amount. funct: Selects the variant of the operation in the “op” field. address / immediate: Address offset or immediate value. target address: Target address of the jump instruction. EECC550 - Shaaban #5 Lec # 4 Winter 2012 12-11-2012 MIPS R-Type (ALU) Instruction Fields R-Type: All ALU instructions that use three registers 1st operand Opcode for R-Type= 0 OP rs 6 bits [31:26] • • • • • • 2nd operand Destination rt rd shamt funct 5 bits 5 bits 5 bits 5 bits 6 bits [25:21] [20:16] [15:11] [10:6] [5:0] Function Field Rs, rt , rd op: Opcode, basic operation of the instruction. are register specifier fields – For R-Type op = 0 Independent RTN: rs: The first register source operand. Instruction Word Mem[PC] rt: The second register source operand. R[rd] R[rs] funct R[rt] rd: The register destination operand. PC PC + 4 shamt: Shift amount used in constant shift operations. funct: Function, selects the specific variant of operation in the op field. Funct field value examples: Add = 32 Sub = 34 AND = 36 OR =37 NOR = 39 Operand register in rs Destination register in rd Examples: add $1,$2,$3 sub $1,$2,$3 R-Type = Register Type Register Addressing used (Mode 1) Operand register in rt and $1,$2,$3 or $1,$2,$3 EECC550 - Shaaban #6 Lec # 4 Winter 2012 12-11-2012 MIPS ALU I-Type Instruction Fields I-Type ALU instructions that use two registers and an immediate value Loads/stores, conditional branches. 1st operand • • • • Destination OP rs rt 6 bits 5 bits 5 bits [31:26] [25:21] [20:16] 2nd operand Immediate (imm16) 16 bits imm16 [15:0] Independent RTN for addi: op: Opcode, operation of the instruction. Instruction Word Mem[PC] rs: The register source operand. R[rt] R[rs] + imm16 PC PC + 4 rt: The result destination register. immediate: Constant second operand for ALU instruction. imm16 Source operand register in rs OP = 8 Examples: OP = 12 Result register in rt add immediate: addi $1,$2,100 and immediate andi $1,$2,10 I-Type = Immediate Type Immediate Addressing used (Mode 2) Constant operand in immediate EECC550 - Shaaban imm16 = 16 bit immediate field #7 Lec # 4 Winter 2012 12-11-2012 MIPS Load/Store I-Type Instruction Fields Base • • • • Src./Dest. OP rs rt address (e.g. offset) 6 bits 5 bits 5 bits [31:26] [25:21] [20:16] 16 bits imm16 [15:0] Signed address offset in bytes op: Opcode, operation of the instruction. – For load word op = 35, for store word op = 43. rs: The register containing memory base address. rt: For loads, the destination register. For stores, the source register of value to be stored. address: 16-bit memory address offset in bytes added to base imm16 register. Sign Extended base register in rs Examples: source register in rt Store word: Load word: Destination register in rt Base or Displacement Addressing used (Mode 3) Offset sw $3, 500($4) lw $1, 32($2) Offset Instruction Word Mem[PC] Mem[R[rs] + imm16] R[rt] PC PC + 4 Instruction Word Mem[PC] R[rt] Mem[R[rs] + imm16] PC PC + 4 base register in rs EECC550 - Shaaban imm16 = 16 bit immediate field #8 Lec # 4 Winter 2012 12-11-2012 MIPS Branch I-Type Instruction Fields • • • • OP rs rt address (e.g. offset) 6 bits 5 bits 5 bits 16 bits imm16 [31:26] [25:21] [20:16] [15:0] Signed address offset in words op: Opcode, operation of the instruction. Word = 4 bytes rs: The first register being compared rt: The second register being compared. address: 16-bit memory address branch target offset in words imm16 added to PC to form branch address. Register in rt Register in rs OP = 4 Examples: OP = 5 offset in bytes equal to instruction address field x 4 Imm16 x 4 Branch on equal beq $1,$2,100 Branch on not equal bne $1,$2,100 Independent RTN for beq: Added to PC+4 to form branch target Sign extended Instruction Word Mem[PC] R[rs] = R[rt] : PC PC + 4 + imm16 x 4 R[rs] R[rt] : PC PC + 4 PC-Relative Addressing used (Mode 4) EECC550 - Shaaban imm16 = 16 bit immediate field #9 Lec # 4 Winter 2012 12-11-2012 MIPS J-Type Instruction Fields J-Type: Include jump j, jump and link jal • • OP jump target 6 bits 26 bits [31:26] [25:0] Word = 4 bytes op: Opcode, operation of the instruction. – Jump j op = 2 – Jump and link jal op = 3 jump target: jump memory address in words. Examples: Jump memory address in bytes equal to instruction field jump target x 4 Jump j 10000 Jump and link jal 10000 Effective 32-bit jump address: PC(31-28) From PC+4 4 bits Independent RTN for j: Jump target in words PC(31-28),jump_target,00 jump target = 2500 26 bits 0 0 2 bits Instruction Word Mem[PC] PC PC + 4 PC PC(31-28),jump_target,00 J-Type = Jump Type Pseudodirect Addressing used (Mode 5) EECC550 - Shaaban #10 Lec # 4 Winter 2012 12-11-2012 A Subset of MIPS Instructions ADD and SUB: add rd, rs, rt R sub rd, rs, rt 32 = add 34 = sub 0 31 26 op I OR Immediate: rs I BRANCH: rd shamt funct 5 bits 6 bits [31:26] [25:21] [20:16] [15:11] [10:6] [5:0] 26 21 rs 16 rt 0 Immediate (imm16) 6 bits 5 bits 5 bits 16 bits [31:26] [25:21] [20:16] [15:0] 21 rs 16 rt 0 Immediate (imm16) 6 bits 5 bits 5 bits 16 bits [31:26] [25:21] [20:16] [15:0] 26 op 4 0 5 bits 31 beq rs, rt, imm16 6 5 bits LOAD and STORE Word I lw rt, rs, imm16 31 26 sw rt, rs, imm16 op 35 = lw 43 = sw rt 11 5 bits op 13 16 6 bits 31 ori rt, rs, imm16 21 6 bits [31:26] 21 rs 5 bits [25:21] Offset in bytes 16 rt 5 bits [20:16] 0 Immediate (imm16) 16 bits Offset in words [15:0] EECC550 - Shaaban #11 Lec # 4 Winter 2012 12-11-2012 Basic MIPS Instruction Processing Steps Instruction/program Memory Instruction Fetch Obtain instruction from program storage Instruction Mem[PC] Next Update program counter to address PC PC + 4 Instruction of next instruction Instruction Determine instruction type Decode Obtain operands from registers Execute Compute result value or status } Done by Control Unit (Based on Opcode) Result Store result in register/memory if needed Store (usually called Write Back). T = I x CPI x C Common steps for all instructions EECC550 - Shaaban #12 Lec # 4 Winter 2012 12-11-2012 Overview of MIPS Instruction Micro-operations • Common Steps • All instructions go through these common steps: – Send program counter to instruction memory and fetch the instruction. (fetch) Instruction Mem[PC] – Update the program counter to point to next instruction PC PC + 4 – Read one or two registers, using instruction fields. (decode) • Load reads one register only. Additional instruction execution actions (execution) depend on the instruction in question, but similarities exist: – All instruction classes (except J type) use the ALU after reading the registers: • Memory reference instructions use it for effective address calculation. • Arithmetic and logic instructions (R-Type), use it for the specified operation. • Branches use it for comparison. • Additional execution steps where instruction classes differ: – Memory reference instructions: Access memory for a load or store. – Arithmetic and logic instructions: Write ALU result back in register. – Branch instructions: Possibly change next instruction address (update PC) based on comparison (i.e if branch is taken). EECC550 - Shaaban #13 Lec # 4 Winter 2012 12-11-2012 A Single Cycle MIPS CPU Design Design target: A single-cycle per instruction MIPS CPU design All micro-operations of an instruction are to be carried out in a single CPU clock cycle. Cycles Per Instruction = CPI = 1 CPU Performance Equation: T = I x CPI x C 4 Abstract view of single cycle MIPS CPU showing major functional units (components) and major connections between them CPI = 1 Ad d Ad d 32 32 32 32 Data Register # PC A ddress Instruction ISA Re g ist e r s ALU A ddress Register # In st r uct ion m e m or y Dat a m e m or y Register # 32 4th Edition Figure 4.1 page 302 - 3rd Edition Figure 5.1 page 287 Data EECC550 - Shaaban #14 Lec # 4 Winter 2012 12-11-2012 R-Type Example: Micro-Operation Sequence For ADD add rd, rs, rt 0 32 = add 34 = sub 0 OP 6 bits [31:26] rs rt rd shamt funct 5 bits 5 bits 5 bits 5 bits 6 bits [25:21] [20:16] [15:11] [10:6] [5:0] Instruction Word Mem[PC] Fetch the instruction PC PC + 4 Increment PC R[rd] R[rs] + R[rt] Program Memory Common Steps Add register rs to register rt result in register rd i.e Funct =add = 32 Independent RTN ? EECC550 - Shaaban #15 Lec # 4 Winter 2012 12-11-2012 Initial Datapath Components Instruction Mem[PC] Three components needed by: Instruction Fetch: Program Counter Update: PC PC + 4 32 I nstruction address 32 I nstruction 32 In st r uct ion m e m or y a. Instruction memory Add Sum PC 32 32 32 Instruction Word b. Program counter 32 32-bit c. Adder Two state elements (memory) needed to store and access instructions: 1 Instruction memory: • Only read access (by user code). No read control signal needed. 2 Program counter (PC): 32-bit register. • Written at end of every clock cycle (edge-triggered) : No write control signal. 3 32-bit Adder: To compute the the next instruction address (PC + 4). 4th Edition Figure 4.5, page 308 - 3rd Edition Figure 5.5, page 293 + Basics of logic design/logic building blocks review in Appendix C in Book CD (4 th Edition Appendix B) EECC550 - Shaaban #16 Lec # 4 Winter 2012 12-11-2012 Building The Datapath Instruction Fetch & PC Update: 2 1 Instruction Mem[PC] PC PC + 4 32 Add 32 4 32 2 PC PC + 4 PC 32 Read address I nstruction Portion of the datapath used for fetching instructions and incrementing the program counter (PC). 32 In st r uct ion m e m or y PC write or update is edge triggered at the end of the cycle 4th Edition Figure 4.6 page 309 - 3rd Edition Figure 5.6 page 293 1 Instruction Mem[PC] Clock input to PC, memory not shown EECC550 - Shaaban #17 Lec # 4 Winter 2012 12-11-2012 More Datapath Components ISA Register File 5 Register numbers 5 5 Data e.g add = 0010 Main 32-bit ALU Read register 1 Read data 1 4 R[rs] 32 Read register 2 Re g ist e r s Write R[rt] register Read data 2 32 Write ALU operation (Function) 32 Zero Data ALU ALU result 32 32 Data RegWrite 32-bit Arithmetic and Logic Unit (ALU) a. Registers Register File: b. ALU Zero = Zero flag = 1 When ALU result equals zero • Contains all ISA registers. • Two read ports and one write port. • Register writes by asserting write control signal • Clocking Methodology: Writes are edge-triggered. • Thus can read and write to the same register in the same clock cycle. 4th Edition Figure 4.7, page 310 - 3rd Edition Figure 5.7, page 295 + Basics of logic design/logic building blocks review in Appendix C in Book CD (4 th Edition Appendix B) EECC550 - Shaaban #18 Lec # 4 Winter 2012 12-11-2012 Register File Details RW RA RB Write Enable 5 5 5 • Register File consists of 32 registers: busA Write Data – Two 32-bit output busses: busW 32 32 32-bit busA and busB 32 Registers busB Clk – One 32-bit input bus: busW 32 • Register is selected by: – RA (number) selects the register to put on busA (data): busA = R[RA] – RB (number) selects the register to put on busB (data): busB = R[RB] – RW (number) selects the register to be written via busW (data) when Write Enable is 1 Write Enable: R[RW] busW • Clock input (CLK) – The CLK input is a factor ONLY during write operations. – During read operation, it behaves as a combinational logic block: • RA or RB valid => busA or busB valid after “access time.” EECC550 - Shaaban #19 Lec # 4 Winter 2012 12-11-2012 A Possible Register File Implementation Register Write Enable (RegWrite) . Write Port Write Write Register RW 5 rd? rt? 5-to-32 Decoder 30 31 Write . .. . Register 1 Data In .. . . Write . . Data In Data Out Data Out Register 30 Register 31 32 . . 2 Read Ports 0 1 .. . R[rs] 32 32-to-1 MUX Register Read Data 1 (Bus A) 30 31 5 Read Register 1 (RA) Data Out Data Out rs 32 0 1 32 Register Write Data (Bus W) Clock input to registers not shown in diagram .. 32 .. . Data In Write 32 Register 0 Data In . 0 1 Each Register contains 32 edge triggered D-Flip Flops RW RA RB Write Enable 5 5 5 busA busW 32 32 32-bit 32 Registers busB Clk 32 Also see Appendix C in Book CD (3rd Edition Appendix B) - The Basics of Logic Design .. . 32-to-1 32 R[rt] MUX Register Read Data 2 (Bus B) 30 31 5 Read Register 2 (RB) rt EECC550 - Shaaban #20 Lec # 4 Winter 2012 12-11-2012 Idealized Memory Write Enable Address Data In DataOut • Memory (idealized) 32 32 – One input bus: Data In. Clk – One output bus: Data Out. Read Enable • Memory word is selected by: – Address selects the word to put on Data Out bus. – Write Enable = 1: address selects the memory word to be written via the Data In bus. • Clock input (CLK): – The CLK input is a factor ONLY during write operation, – During read operation, this memory behaves as a combinational logic block: • Address valid => Data Out valid after “access time.” • Ideal Memory = Short access time. EECC550 - Shaaban Compared to other components in CPU datapath #21 Lec # 4 Winter 2012 12-11-2012 Clocking Methodology Used: Edge Triggered Writes Clk Setup Hold Setup Hold Don’t Care CLK-to-Q . . . CLK-to-Q . . . . . . . . . Critical Path (Longest delay path) Clock • • Clock All storage element (e.g Flip-Flops, Registers, Data Memory) writes are triggered by the same clock edge. Cycle Time = CLK-to-Q + Longest Delay Path + Setup + Clock Skew Here writes are triggered on the rising edge of the clock EECC550 - Shaaban #22 Lec # 4 Winter 2012 12-11-2012 Simplified Datapath For MIPS R-Type Instructions From Instruction Memory [25:21] rs 4 R[rs] [20:16] (Function) e.g add = 0010 rt 32 [15:11] rd R[rt] 32 32 Components and connections as specified by RTN statement Clock input to register bank not shown 32 R[rd] R[rs] + R[rt] i.e Funct = function =add Destination register R[rd] write or update is edge triggered at the end of the cycle EECC550 - Shaaban #23 Lec # 4 Winter 2012 12-11-2012 More Detailed Datapath For R-Type Instructions With Control Points Identified Rd Rs RegWr 5 5 Rw 32 Clk Ra Rb 32 32-bit Registers ALUctr Function =Add, Subtract … 4 5 busA R[rs] 32 busB R[rt] ALU busW Rt Result 32 32 R[rd] R[rs] + R[rt] i.e Funct = function =add EECC550 - Shaaban #24 Lec # 4 Winter 2012 12-11-2012 R-Type Register-Register Timing PC+4 Clk Old Value Rs, Rt, Rd, Op, Func PC Clk-to-Q New Value Old Value ALUctr Old Value RegWr Old Value busA, B Old Value busW Old Value Instruction Memory Access Time New Value Delay through Control Logic New Value New Value Register File Access Time New Value ALU Delay New Value Rd Rs Rt RegWr 5 5 5 busA R[rs] 32 busB 32 R[rt] ALU busW 32 Clk Rw Ra Rb 32 32-bit Registers Register Write Occurs Here ALUct r Result 32 All register writes occur on falling edge of clock (clocking methodology) EECC550 - Shaaban #25 Lec # 4 Winter 2012 12-11-2012 Logical Operations with Immediate Example: Micro-Operation Sequence For ORI ori rt, rs, imm16 31 26 op 13 21 16 rs rt 0 Immediate (imm16) 6 bits 5 bits 5 bits 16 bits [31:26] [25:21] [20:16] [15:0] Instruction Word Mem[PC] Fetch the instruction PC PC + 4 Increment PC R[rt] R[rs] OR ZeroExt[imm16] OR register rs with immediate field zero extended to 32 bits, result in register rt Common Steps Done by Main ALU 000 ….. 000 Not in book version Imm16 EECC550 - Shaaban #26 Lec # 4 Winter 2012 12-11-2012 Datapath For Logical Instructions With Immediate Rd RegDst 1 Rt Mux RegWr 5 Rw busW 2x1 MUX (width 5 bits) 0 Rs Rt 5 5 ALUctr R[rt] busB ZeroExt 16 Result 32 0 Mux 32 imm16 ALU 32 32 32-bit Registers 32 Clk R[rs] busA Ra Rb Function = OR 2x1 MUX (width 32 bits) 1 32 ALUSrc R[rt] R[rs] OR ZeroExt[imm16] 000 ….. 000 Imm16 EECC550 - Shaaban #27 Lec # 4 Winter 2012 12-11-2012 Load Operations Example: Micro-Operation Sequence For LW lw rt, rs, imm16 31 26 op 35 21 rs 16 rt 0 Immediate (imm16) 6 bits 5 bits 5 bits 16 bits [31:26] [25:21] [20:16] [15:0] Signed Address offset in bytes Instruction Word Mem[PC] Fetch the instruction PC PC + 4 Increment PC Instruction Memory R[rt] Mem[R[rs] + SignExt[imm16]] Effective Address Data Memory To load from Common Steps Immediate field sign extended to 32 bits and added to register rs to form memory load address, write word at load effective address to register rt EECC550 - Shaaban #28 Lec # 4 Winter 2012 12-11-2012 Additional Datapath Components For Loads & Stores 4th Edition Figure 4.8, page 311 3rd Edition Figure 5.8, page 296 M emWrite Address 32 32 Write data Read data 32 Dat a m e m or y M emRead a. Data memory unit Inputs: for address and write (store) data Output for read (load) data Data memory write or update is edge triggered at the end of the cycle (clocking methodology) 16 Si g n ext end 32 For SignExt[imm16] b. Sign-extension unit 16-bit input sign-extended into a 32-bit value at the output EECC550 - Shaaban #29 Lec # 4 Winter 2012 12-11-2012 Datapath For Loads Rd RegDst 1 32 Clk Rs 5 5 ALUctr Function = add Base Address register Rw Ra Rb 32 32-bit Registers 32 busB R[rt] 0 1 32 ALUSrc 0 32 MemWr 32 bit 2X1 Mux Mu x Extender 16 Offset Effective Address Mux 32 imm16 MemtoReg R[rs] busA ALU busW Rt Mux 0 RegWr 5 LW WrEn Adr Data In 32 Clk ExtOp Data Memory 32 1 MemRd R[rt] Mem[R[rs] + SignExt[imm16]] Data Memory Effective Address EECC550 - Shaaban #30 Lec # 4 Winter 2012 12-11-2012 Store Operations Example: Micro-Operation Sequence For SW sw rt, rs, imm16 31 26 op 43 21 rs 16 rt 0 Immediate (imm16) 6 bits 5 bits 5 bits 16 bits [31:26] [25:21] [20:16] [15:0] Signed Address offset in bytes Instruction Word Mem[PC] Fetch the instruction PC PC + 4 Increment PC Mem[R[rs] + SignExt[imm16]] R[rt] Immediate field sign extended to 32 bits and added to register rs to form memory store effective address, register rt written to memory at store effective address. Effective Address To store at Data Memory Common Steps EECC550 - Shaaban #31 Lec # 4 Winter 2012 12-11-2012 Datapath For Stores Rd RegDst 1 Rt Mux 32 Clk 5 Rt 5 Base Address register Rw Ra Rb 32 32-bit Registers 32 busB R[rt] R[rt] WrEn Adr Data In 32 1 32 ExtOp 0 32 Clk ALUSrc Mu x Extender 16 Offset Effective Address 0 Mux 32 imm16 R[rs] busA ALU busW MemtoReg MemWr 0 Rs RegWr 5 ALUctr Add = SW Data Memory 32 1 MemRd Mem[R[rs] + SignExt[imm16]] R[rt] Data Memory Effective Address EECC550 - Shaaban #32 Lec # 4 Winter 2012 12-11-2012 Conditional Branch Example: Micro-Operation Sequence For BEQ beq rs, rt, imm16 31 26 op 4 21 rs 16 rt 0 immediate 6 bits 5 bits 5 bits 16 bits [31:26] [25:21] [20:16] [15:0] PC Offset in words Instruction Word Mem[PC] Fetch the instruction PC PC + 4 Increment PC Zero R[rs] - R[rt] Calculate the branch condition R[rs] == R[rt] Condition Action (i.e Common Steps R[rs] - R[rt] = 0 ) Then Zero = 1 Zero : PC PC + ( SignExt(imm16) x 4 ) Calculate the next instruction’s PC address Branch Target “Zero” is zero flag of main ALU EECC550 - Shaaban #33 Lec # 4 Winter 2012 12-11-2012 Main ALU evaluates branch condition New adder to compute branch target: • Sum of incremented PC and Datapath For Branch Instructions sign-extended lower 16-bits on the instruction. PC4 from instruction datapath New 32-bit Adder (Third ALU) for Branch Target Sh if t le f t 2 PC + 4 + ( SignExt(imm16) x 4 [25:21] rs Read I nstruction [20:16] rt register 1 Read data 1 Read register 2 Re g is t e r s Write register Read 4 R[rs] ISA ALU operation = Subtract To branch control logic ALU Zero R[rt] data 2 Write data (Main ALU) RegWrite [15:0] imm16 Branch target Add Sum SignExt(imm16) x 4 16 Zero flag =1 if R[rs] - R[rt] = 0 (i.e R[rs] = R[rt]) Zero R[rs] - R[rt] 32 Sig n ex t e n d SignExt(imm16) 4th Edition Figure 4.9, page 312 - 3rd Edition Figure 5.9, page 297 Main ALU Evaluates Branch Condition (subtract) EECC550 - Shaaban #34 Lec # 4 Winter 2012 12-11-2012 More Detailed Datapath For Branch Operations Branch Zero Zero Instruction Address 32 0 PC Mux Adder PC Ext Sign extend shift left 2 busW 00 Adder 32 PC+4 5 R[rs] busA 32 R[rt] busB 32 Clk 1 Main ALU (subtract) Branch Target Zero R[rs] - R[rt] Clk New Third ALU (adder) 5 Rw Ra Rb 32 32-bit Registers PC Branch Target ALU Rt Equal? 4 imm16 Rs RegWr 5 New 2X1 32-bit MUX to select next PC value EECC550 - Shaaban #35 Lec # 4 Winter 2012 12-11-2012 Combining The Datapaths For Memory Instructions and R-Type Instructions 4 [25:21] rs R[rs] 32 [20:16] rt R[rt] 32 0 1 32 1 0 R[rt] 32 rt/rd MUX not shown SignExt(imm16) [15:0] imm16 32 Highlighted muliplexors and connections added to combine the datapaths of memory and R-Type instructions into one datapath This is book version ORI not supported 4th Edition Figure 4.10 Page 314 - 3rd Edition Figure 5.10 Page 299 EECC550 - Shaaban #36 Lec # 4 Winter 2012 12-11-2012 Instruction Fetch Datapath Added to ALU R-Type and Memory Instructions Datapath PC+ 4 32 32 Combination of Figure 4.10 (p. 314) and Figure 4.6 (p. 309) [3rd Edition Figure 5.10 (p. 299) and Figure 5.6 (p. 293)] PC Add 4 PC Read address rs rt I nstruction In st r uct ion m e m or y Read register 1 Read register 2 Write data Re g ist e r s 16 0 M u x 1 A LU ALU result Address Read data 1 M u x 0 Dat a m e m or y R[rt] Sig n ex t e n d M emtoReg Zero 32 RegWrite rt/rd MUX not shown ALU operation M emWrite ALUSrc Read R[rt] data 2 Write register 4 Read R[rs] data 1 32 Write data 32 M emRead 32 This is book version ORI not supported, no zero extend of immediate needed EECC550 - Shaaban #37 Lec # 4 Winter 2012 12-11-2012 A Simple Datapath For The MIPS Architecture Datapath of branches and a program counter multiplexor are added. Resulting datapath can execute in a single cycle the basic MIPS instruction: - load/store word - ALU operations - Branches Zero 32 32 PC +4 Branch 32 32 32 Branch Target 4 rs R[rs] rt R[rt] 0 1 rt/rd MUX not shown 1 32 0 32 32 This is book version ORI not supported, no zero extend of immediate needed 4th Edition Figure 4.11 page 315 - 3rd Edition Figure 5.11 page 300 EECC550 - Shaaban #38 Lec # 4 Winter 2012 12-11-2012 Main ALU Control 3rd Edition • The main ALU has four control lines (detailed design in Appendix B) with the following functions: th ALU Control Lines 4 Edition Appendix C ALU Function 0000 AND 0001 OR 0010 add 0110 subtract 0111 1100 Set-on-less-than NOR Not Used • For our current subset of MIPS instructions only the top five functions will be used (thus only three control lines will be used) • For R-type instruction the ALU function depends on both the opcode and the 6-bit “funct” function field • For other instructions the ALU function depends on the opcode only. • A local ALU control unit can be designed to accept 2-bit ALUop control lines (from main control unit) and the 6-bit function field and generate the correct 4-bit ALU control lines. Or 3 bits depending on number functions actually used EECC550 - Shaaban #39 Lec # 4 Winter 2012 12-11-2012 Local ALU Decoding of “func” Field func op 6 Main Control 6 ALUop Subtract = 01 Add = 00 Instruction Operation LW SW Branch Equal R-Type R-Type R-Type R-Type R-Type Load word Store word branch equal add subtract AND OR set on less than 4 Or 3 bits Opcode Instruction Opcode ALUctr ALU 2 ALU Control (Local) Desired ALUOp Funct Field ALU Action 00 00 01 10 10 10 10 10 R-Type = 10 XXXXXX XXXXXX XXXXXX 100000 100010 100100 100101 101010 add add subtract add subtract and or set on less than ALU Control Lines 0010 0010 0110 0010 0110 0000 0001 0111 EECC550 - Shaaban #40 Lec # 4 Winter 2012 12-11-2012 Local ALU Control Unit Add = 00 Subtract = 01 R-type =10 Add Subtract Add Subtract AND OR Set-On-less-Than { Page 302 (2 lines From main control unit) 2 Function Field 3 ALU Control Lines 4th line = 0 More details found in Appendix D in Book CD – (3rd Edition Appendix C) EECC550 - Shaaban #41 Lec # 4 Winter 2012 12-11-2012 Single Cycle MIPS Datapath Necessary multiplexors and control lines are identified here and local ALU control added: 32 PCSrc 32 PC +4 32 Add Add 4 Shift left 2 RegWrite Instruction [25:21] PC Read address Read register 1 rs Instruction [20:16] Instruction [31:0] Instruction memory rt Read register 2 0 Instruction [15:11] 1 rd M u x RegDst Instruction [15:0] imm16 Write register Write data 16 ALUSrc R[rt] 0 1 M u x MemtoReg Zero ALU ALU result Registers Sign extend 1 MemWrite R[rs] Read data 1 M u x Branch Target 32 Read data 2 32 ALU result Branch 0 PC +4 32 Zero Address Read data 32 R[rt] 1 M u x 0 Data Write memory data ALUOp (2-bits) 00 = add 01 = subtract 10 = R-Type 32 ALU control Function Field Instruction [5:0] ALUOp MemRead 32 This is book version ORI not supported, no zero extend of immediate needed 4th Edition Figure 4.15 page 320 - 3rd Edition Figure 5.15 page 305 EECC550 - Shaaban #42 Lec # 4 Winter 2012 12-11-2012 Putting It All Together: A Single Cycle Datapath PCSrc Branch Zero PC+4 Zero Function Field 0 32 imm16 16 (Includes ORI not in book version) 1 = 32 Data In 32 ExtOp ALUSrc MemWr MemtoReg Clk 32 0 Mux Clk Extender Clk 00 = add 01 = subtract 10 = R-Type Main ALU ALU 1 busW Mux PC Mux Adder Rs Rt 5 5 R[rs] busA Rw Ra Rb 32 32 32-bit R[rt] Registers busB 0 32 ALU Control RegWr 5 Branch Target e.g Sign Extend + Shift Left 2 ALUop (2-bits) Imm16 Rd Rt 0 1 Adder PC Ext imm16 Rd RegDst 00 4 Rt Instruction<31:0> <0:15> Rs <11:15> Adr <16:20> <21:25> Inst Memory WrEn Adr 1 Data Memory MemRd EECC550 - Shaaban #43 Lec # 4 Winter 2012 12-11-2012 Instruction<31:0> Rd <0:25> Rs <0:15> Rt <11:15> Op Fun <16:20> Adr <21:25> <21:25> Instruction Memory Imm16 Jump_target Control Unit Control Lines RegDst ALUSrc MemtoReg RegWrite Mem Read Mem Write Branch ALOp (2-bits) DATA PATH EECC550 - Shaaban #44 Lec # 4 Winter 2012 12-11-2012 The Effect of The Control Signals Signal Name Effect when deasserted (=0) Effect when asserted (=1) RegDst The register destination number for the write register comes from the rt field (instruction bits 20:16). The register destination number for the write register comes from the rd field (instruction bits 15:11). RegWrite None The register on the write register input is written with the value on the Write data input. ALUSrc The second main ALU operand comes from the second register file output (Read data 2) R[rt] The second main ALU operand is the sign-extended lower 16 bits on the instruction (imm16) Branch The PC is replaced by the output of the adder that computes PC + 4 (BEQ) If Zero =1 The PC is replaced by the output of the adder that computes the branch target. MemRead None Data memory contents designated by the address input are put on the Read data output. MemWrite None MemtoReg The value fed to the register write data input comes from the main ALU. Data memory contents designated by the address input are replaced by the value on the Write data input. The value fed to the register write data input comes from data memory. EECC550 - Shaaban #45 Lec # 4 Winter 2012 12-11-2012 Control Line Settings Instruction R-Format RegDst ALUSrc Memto- Reg Mem Reg Write Read Control Lines Mem Branch ALUOp1 ALUOp0 Write 1 0 0 1 0 0 0 1 0 lw 0 1 1 1 1 0 0 0 0 sw X 1 X 0 0 1 0 0 0 beq X 0 X 0 0 0 1 0 1 ALUOp (2-bits) 4th Edition Figure 4.18 page 323 3rd Edition Figure 5.18 page 308 00 = add 01 = subtract 10 = R-Type EECC550 - Shaaban #46 Lec # 4 Winter 2012 12-11-2012 The Truth Table For The Main Control (Opcode) Similar to Figure 4.22 Page 327 (3rd Edition Figure 5.22 Page 312) EECC550 - Shaaban #47 Lec # 4 Winter 2012 12-11-2012 PLA Implementation of the Main Control Control Lines To Datapath Figure D.2.5 in Appendix D (3rd Edition Figure C.2.5 in Appendix C) PLA = Programmable Logic Array - Appendix C (3rd Edition Appendix B) EECC550 - Shaaban #48 Lec # 4 Winter 2012 12-11-2012 Single Cycle MIPS Datapath Control Unit Added 32 32 32 PC +4 Add 32 PC +4 M u x ALU Branch 1 Add Target result PC +4 32 4 Control Opcode PC Read address Instruction [25–21] rs Read register 1 Instruction [20–16] rt Read register 2 Instruction [31–0] Instruction memory 0 Instruction [15–11] rd Instruction [15–0] imm16 4th Edition Figure 4.21, page 326 3rd Edition Figure 5.17, page 307 1 M u x Shift left 2 RegDst Branch MemRead MemtoReg ALUOp MemWrite ALUSrc RegWrite Instruction [31–26] Write register Write data 16 0 Read data 1 R[rs] Zero R[rt] Read data 2 Registers Sign extend 0 1 M u x ALU ALU result Address Read data 1 M u 0x Data Write memory data 32 ALU control Function Field Instruction [5–0] In this book version, ORI is not supported—no zero extend of immediate needed. ALUOp (2-bits) 00 = add 01 = subtract 10 = R-Type 32 EECC550 - Shaaban #49 Lec # 4 Winter 2012 12-11-2012 Adding Support For Jump: Micro-Operation Sequence For Jump: J j jump_target 2 OP Jump_target 6 bits 26 bits [31:26] [25:0] Jump address in words Instruction Word Mem[PC] Fetch the instruction PC PC + 4 Increment PC PC PC(31-28),jump_target,00 Update PC with jump address Common Steps PC(31-28) Jump Address jump target 4 bits 4 highest bits from PC + 4 26 bits 0 0 2 bits EECC550 - Shaaban #50 Lec # 4 Winter 2012 12-11-2012 Datapath For Jump Branch Zero Next Instruction Address 32 4 PCSrc PC+4 32 00 Adder e.g Sign Extend + Shift Left 2 PC+4(31-28) jump_target PC(31-28) 4 bits PC 1 Branch Target Jump Address Clk PC(31-28),jump_target,00 Instruction(25-0) 26 0 PC 4 32 Mux Mux Adder imm16 PC Ext Instruction(15-0) JUMP Shift left 2 jump target 26 bits 28 32 Jump Address 0 0 EECC550 2 bits - Shaaban #51 Lec # 4 Winter 2012 12-11-2012 Single Cycle MIPS Datapath Extended To Handle Jump with Control Unit Added 32 Instruction [25–0] 32 Jump address [31–0] Shift left 2 26 28 PC + 4 [31–28] 4 Add PC +4 32 PC +4 32 0 M u x PC +4 Add 4 ALU result Branch Target 1 1 32 M u x 0 Shift left 2 RegDst Jump Branch Opcode MemRead Instruction [31–26] MemtoReg Control ALUOp MemWrite ALUSrc RegWrite Instruction [25–21] PC Read address Instruction [20–16] Instruction [31–0] Instruction memory Instruction [15–11] Instruction [15–0] imm16 3rd rt Read register 1 Read data 1 Read register 2 Edition Figure 4.24 page 329 Edition Figure 5.24 page 314 1 M u x Read data 2 Write register Write data 16 R[rs] Zero 0 rd 4th rs ALU R[rt] 0 M u x ALU result Data memory 1 Registers Sign extend Address R[rt] Write data Read data 1 0 M u x 32 32 ALU control Function Field Instruction [5–0] In this book version, ORI is not supported—no zero extend of immediate needed. ALUOp (2-bits) 00 = add 01 = subtract 10 = R-Type EECC550 - Shaaban #52 Lec # 4 Winter 2012 12-11-2012 Control Line Settings (with jump instruction, j added) Instruction RegDst ALUSrc Memto- Reg Mem Reg Write Read Mem Write Branch ALUOp1 ALUOp0 Jump 1 0 0 1 0 0 0 1 0 0 lw 0 1 1 1 1 0 0 0 0 0 sw X 1 X 0 0 1 0 0 0 0 beq X 0 X 0 0 0 1 0 1 0 j X X X 0 0 0 X X X 1 R-Format Figure 4.18 page 323 (3rd Edition Figure 5.18 page 308) modified to include j EECC550 - Shaaban #53 Lec # 4 Winter 2012 12-11-2012 Worst Case Timing (Load) Clk PC Old Value Clk-to-Q New Value Instruction Memoey Access Time New Value Rs, Rt, Rd, Op, Func Old Value ALUctr Old Value ExtOp Old Value New Value ALUSrc Old Value New Value MemtoReg Old Value New Value RegWr Old Value New Value busA busB Delay through Control Logic New Value Register Write Occurs Register File Access Time New Value Old Value Delay through Extender & Mux Old Value New Value ALU Delay Address Old Value New Value Data Memory Access Time busW Old Value New EECC550 - Shaaban #54 Lec # 4 Winter 2012 12-11-2012 Instruction Timing Comparison Arithmetic & Logical PC Inst Memory Reg File mux ALU mux setup Load PC Inst Memory Reg File mux Critical Path ALU Data Mem Store PC Inst Memory Reg File ALU Data Mem Branch PC Inst Memory Reg File Jump PC Inst Memory mux cmp mux setup mux mux EECC550 - Shaaban #55 Lec # 4 Winter 2012 12-11-2012 Simplified Single Cycle Datapath Timing • Assuming the following datapath/control hardware components delays: – – – – Memory Units: 2 ns ALU and adders: 2 ns Register File: 1 ns Control Unit < 1 ns } Obtained from low-level target VLSI implementation technology of components • Ignoring Mux and clk-to-Q delays, critical path analysis: 1 ns Control Unit 2 ns 2 ns Instruction Memory Main ALU Register Read Critical Path PC + 4 ALU 2 ns 1 ns Data Memory Register Write Critical Path = 8 ns (LW) (Load) Branch Target ALU Time 2 ns 0 2ns 3ns 4ns 5ns 7ns 8ns EECC550 - Shaaban ns = nanosecond = 10-9 second #56 Lec # 4 Winter 2012 12-11-2012 Performance of Single-Cycle (CPI=1) CPU • Assuming the following datapath hardware components delays: – Memory Units: 2 ns – ALU and adders: 2 ns – Register File: 1 ns • Nanosecond, ns = 10-9 second The delays needed for each instruction type can be found : Instruction Class Instruction Memory Register Read ALU Operation Data Memory ALU 2 ns 1 ns 2 ns Load 2 ns 1 ns 2 ns 2 ns Store 2 ns 1 ns 2 ns 2 ns Branch 2 ns 1 ns 2 ns Jump 2 ns Register Write Total Delay 1 ns 6 ns 1 ns 8 ns Load has longest delay of 8 ns thus determining the clock cycle of the CPU to be 8ns 7 ns 5 ns 2 ns C = 8 ns • The clock cycle is determined by the instruction with longest delay: The load in this case which is 8 ns. Clock rate = 1 / 8 ns = 125 MHz • A program with I = 1,000,000 instructions executed takes: Execution Time = T = I x CPI x C = 106 x 1 x 8x10-9 = 0.008 s = 8 msec T = I x CPI x C EECC550 - Shaaban #57 Lec # 4 Winter 2012 12-11-2012 Adding Support for jal to Single Cycle Datapath • The MIPS jump and link instruction, jal is used to support procedure calls by jumping to jump address (similar to j ) and saving the address of the following instruction PC+4 in register $ra ($31) i.e. Return Address R[31] PC + 4 jal Address PC Jump Address • jal uses the j instruction format: op (6 bits) Target address (26 bits) • We wish to add jal to the single cycle datapath in Figure 4.24 page 329 (3rd Edition Figure 5.24 page 314) . Add any necessary datapaths and control signals to the single-clock datapath and justify the need for the modifications, if any. • Specify control line values for this instruction. EECC550 - Shaaban #58 Lec # 4 Winter 2012 12-11-2012 jump and link, jal support to Single Cycle Datapath Instruction Word Mem[PC] R[31] PC + 4 PC Jump Address Jump Address PC + 4 PC + 4 Branch Target PC + 4 rs R[rs] rt R[rt] 31 2 2 rd imm16 1. Expand the multiplexor controlled by RegDst to include the value 31 as a new input 2. 2. Expand the multiplexor controlled by MemtoReg to have PC+4 as new input 2. EECC550 - Shaaban #59 Lec # 4 Winter 2012 12-11-2012 jump and link, jal support to Single Cycle Datapath Adding Control Lines Settings for jal (For Textbook Single Cycle Datapath including Jump) RegDst Is now 2 bits RegDst MemtoReg Is now 2 bits ALUSrc MemtoReg Reg Write Mem Mem Read Write Branch ALUOp1 ALUOp0 Jump R-format 01 0 00 1 0 0 0 1 0 0 lw 00 1 01 1 1 0 0 0 0 0 sw xx 1 xx 0 0 1 0 0 0 0 beq xx 0 xx 0 0 0 1 0 1 0 J xx x xx 0 0 0 x x x 1 JAL 10 x 10 1 0 0 x x x 1 R[31] PC+ 4 PC Jump Address Instruction Word Mem[PC] R[31] PC + 4 PC Jump Address EECC550 - Shaaban #60 Lec # 4 Winter 2012 12-11-2012 Load Word Register Adding Support for LWR to Single Cycle Datapath • We wish to add a variant of lw (load word) let’s call it LWR to the single cycle datapath in Figure 4.24 page 329 (3rd Edition Figure 5.24 page 314). LWR $rd, $rs, $rt • The LWR instruction is similar to lw but it sums two registers (specified by $rs, $rt) to obtain the effective load address and Loaded word from memory written to register rd uses the R-Type format • Add any necessary datapaths and control signals to the single cycle datapath and justify the need for the modifications, if any. • Specify control line values for this instruction. EECC550 - Shaaban #61 Lec # 4 Winter 2012 12-11-2012 LWR (R-format LW) support to Single Cycle Datapath Instruction Word Mem[PC] PC PC + 4 R[rd] Mem[ R[rs] + R[rt] ] No new components or connections are needed for the datapath just the proper control line settings Adding Control Lines Settings for LWR (For Textbook Single Cycle Datapath including Jump) MemtoReg Reg Write Mem Mem Read Write Branch ALUOp1 ALUOp0 Jump RegDst ALUSrc R-format 1 0 0 1 0 0 0 1 0 0 lw 0 1 1 1 1 0 0 0 0 0 sw x 1 x 0 0 1 0 0 0 0 beq x 0 x 0 0 0 1 0 1 0 J x x x 0 0 0 x x x 1 LWR 1 0 1 1 1 0 0 0 0 0 Add rd R[rt] EECC550 - Shaaban #62 Lec # 4 Winter 2012 12-11-2012 Jump Memory Adding Support for jm to Single Cycle Datapath • We wish to add a new instruction jm (jump memory) to the single cycle datapath in Figure 4.24 page 329 (3rd Edition Figure 5.24 page 314). jm offset($rs) • The jm instruction loads a word from effective address (R[rs] + offset), this is similar to lw except the loaded word is put in the PC instead of register $rt. • Jm used the I-format with field rt not used. OP rs rt 6 bits 5 bits 5 bits address (imm16) Not Used 16 bits • Add any necessary datapaths and control signals to the single cycle datapath and justify the need for the modifications, if any. • Specify control line values for this instruction. EECC550 - Shaaban #63 Lec # 4 Winter 2012 12-11-2012 Adding jump memory, jm support to Single Cycle Datapath Instruction Word Mem[PC] PC Mem[R[rs] + SignExt[imm16]] 1. Expand the multiplexor controlled by Jump to include the Read Data (data memory output) as new input 2. The Jump control signal is now 2 bits Jump 2 2 Jump PC + 4 2 Branch Target rs R[rs] rt R[rt] rd imm16 EECC550 - Shaaban #64 Lec # 4 Winter 2012 12-11-2012 Adding jm support to Single Cycle Datapath Adding Control Lines Settings for jm (For Textbook Single Cycle Datapath including Jump) Jump is now 2 bits RegDst ALUSrc MemtoReg Reg Write Mem Mem Read Write Branch ALUOp1 ALUOp0 Jump R-format 1 0 0 1 0 0 0 1 0 00 lw 0 1 1 1 1 0 0 0 0 00 sw x 1 x 0 0 1 0 0 0 00 beq x 0 x 0 0 0 1 0 1 00 J x x x 0 0 0 x x x 01 Jm x 1 x 0 1 0 x 0 0 10 add PC Mem[R[rs] + SignExt[imm16]] EECC550 - Shaaban #65 Lec # 4 Winter 2012 12-11-2012 Drawbacks of Single Cycle Processor 1. Long cycle time: – All instructions must take as much time as the slowest • Here, cycle time for load is longer than needed for all other instructions. – Cycle time must be long enough for the load instruction: PC’s Clock -to-Q + Instruction Memory Access Time + Register File Access Time + ALU Delay (address calculation) + Data Memory Access Time + Register File Setup Time + Clock Skew – Real memory is not as well-behaved as idealized memory • Cannot always complete data access in one (short) cycle. 2. Impossible to implement complex, variable-length instructions and complex addressing modes in a single cycle. – e.g indirect memory addressing. e.g R[$1] Mem[ Mem[$2] ] 3. High and duplicate hardware resource requirements – Any hardware functional unit cannot be used more than once in a single cycle (e.g. ALUs). 4. Does not allow overlap of instruction processing (instruction pipelining, chapter 6). EECC550 - Shaaban #66 Lec # 4 Winter 2012 12-11-2012