CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Chapter 4 The Processor Zhao Zhang Iowa State University Revised from original slides provided by MKP Week 8 Overview CPU design overview Datapath and Control Control Unit ALU Control Unit Chapter 1 — Computer Abstractions and Technology — 2 Announcements Mini-project B starts in week 9 Mini-projects B and C will be revised The grading scale will be discussed by Friday (week 8) Chapter 1 — Computer Abstractions and Technology — 3 §4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined by ISA and compiler Determined by CPU hardware We will examine two MIPS implementations A simplified, single-cycle version A more realistic, pipelined version Chapter 4 — The Processor — 4 Nine-Instruction MIPS We will first use a MIPS subset of nine instructions, then extend the subset It’s enough to illustrate the most aspects of CPU design, particularly datapath and control design Memory reference: LW and SW Arithmetic/logic: ADD, SUB, AND, OR, SLT Branch: BEQ, BNE Chapter 1 — Computer Abstractions and Technology — 5 Instruction Execution PC instruction memory, Fetch instruction Register numbers register file, Read registers Then, depending on instruction class Execute: Use ALU to calculate Arithmetic result Memory address for load/store Branch target address Memory access: Access data memory for load/store Register writeback: Write data back to registers PC update (for all): PC target address or PC + 4 Chapter 4 — The Processor — 6 CPU Overview A Sketchy view Next Sequential PC = PC + 4 Branch Target = (PC+4)+offset An instruction may change 1. PC (all instructions) 2. Some register (arithmetic/logic, load) 3. Some memory word/halfword/byte (store) Chapter 4 — The Processor — 7 Multiplexers Can’t just join wires together Use multiplexers What would happen if you just join signals in VHDL? Chapter 4 — The Processor — 8 Control Control signals: mux select, read/write enable, ALU opcode, etc. Chapter 4 — The Processor — 9 Combinational element Operate on data Output is a function of input State (sequential) elements §4.2 Logic Design Conventions Logic Design Basics Store information Output is a function of internal state and input Chapter 4 — The Processor — 10 Combinational Elements AND-gate Y=A&B A B Multiplexer A + Y=A+B Y B Y Adder Arithmetic/Logic Unit Y = F(A, B) Y = S ? I1 : I0 A I0 I1 M u x S ALU Y Y B F Chapter 4 — The Processor — 11 Sequential Elements Register: stores data in a circuit Uses a clock signal to determine when to update the stored value Edge-triggered: update when Clk changes from 0 to 1 Data output Q is stable for a clock cycle Clk D Q D Clk Q Chapter 4 — The Processor — 12 Sequential Elements Register with write control Only updates on clock edge when write control input is 1 VHDL: rising_edge(Clk) AND Write Used when stored value is required later Clk D Write Clk Q Write D Q Chapter 4 — The Processor — 13 Clocking Methodology Combinational logic transforms data during clock cycles Input from state elements Output must stabilize within one cycle Longest delay determines clock period Output to state element at the next rising edge Chapter 4 — The Processor — 14 Clocking Methodology Processor is a big state machine Works like a Moore machine in non-I/O phase Output is a function of the state States include PC, all registers and memory contents Chapter 1 — Computer Abstractions and Technology — 15 Datapath elements Elements that process data and addresses in the CPU Registers, ALUs, mux’s, memories, … §4.3 Building a Datapath Building a Datapath We will build a MIPS datapath incrementally Refining the overview design Chapter 4 — The Processor — 16 Instruction Fetch 32-bit register Increment by 4 for next instruction Datapath elements: PC register, instruction memory, 32-bit adder Chapter 4 — The Processor — 17 R-Format Instructions Read two register operands Perform arithmetic/logical operation Write register result Datapath elements: Register file, ALU Chapter 4 — The Processor — 18 Load/Store Instructions Read register operands Calculate address using 16-bit offset Use ALU, but sign-extend offset Load: Read memory and update register Store: Write register value to memory Datapath elements: Data memory, sign extender Chapter 4 — The Processor — 19 Branch Instructions Read register operands Compare operands Use ALU, subtract and check Zero output Calculate target address Sign-extend displacement Shift left 2 places (word displacement) Add to PC + 4 Already calculated by instruction fetch Chapter 4 — The Processor — 20 Branch Instructions Just re-routes wires New: Shifter, 2nd 32-bit Adder Sign-bit wire replicated Chapter 4 — The Processor — 21 Composing the Elements First-cut data path does an instruction in one clock cycle Each datapath element can only do one function at a time Hence, we need separate instruction and data memories Use multiplexers where alternate data sources are used for different instructions Chapter 4 — The Processor — 22 R-Type/Load/Store Datapath Chapter 4 — The Processor — 23 Full Datapath Chapter 4 — The Processor — 24 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not every instruction requires the same time Chapter 4 — The Processor — 25 Performance Issues Some instructions may take substantially longer time, e.g. multiply/division Not feasible to vary clock cycle for different instructions Must use the worst-case delay as the clock cycle Violates design principle making the common case fast We will improve performance by pipelining Chapter 4 — The Processor — 26 ALU used for Load/Store: F = add Branch: F = subtract R-type: F depends on funct field ALU control Function 0000 AND 0001 OR 0010 add 0110 subtract 0111 set-on-less-than 1100 NOR §4.4 A Simple Implementation Scheme ALU Control Chapter 4 — The Processor — 27 ALU Control Assume 2-bit ALUOp derived from opcode Combinational logic derives ALU control opcode ALUOp Operation funct ALU function ALU control lw 00 load word XXXXXX add 0010 sw 00 store word XXXXXX add 0010 beq 01 branch equal XXXXXX subtract 0110 R-type 10 add 100000 add 0010 subtract 100010 subtract 0110 AND 100100 AND 0000 OR 100101 OR 0001 set-on-less-than 101010 set-on-less-than 0111 Chapter 4 — The Processor — 28 VHDL Notes How to program the ALU control? -- Behavior style process (alu_op, funct) begin case alu_op is when ‘00’ => alu_code <= ‘0010’; when ’01’ => … end case; end process; Chapter 1 — Computer Abstractions and Technology — 29 The Main Control Unit Control signals derived from instruction R-type 0 rs 31:26 Load/ Store 35 or 43 31:26 Branch 4 25:21 rs opcode 25:21 always read rd 20:16 rt 25:21 rs 31:26 rt shamt 15:11 10:6 funct 5:0 address 20:16 rt 15:0 address 20:16 read, except for load 15:0 write for R-type and load sign-extend and add Chapter 4 — The Processor — 30 Datapath With Control Chapter 4 — The Processor — 31 Summary of Control Signals RegDst: Write to register rt or rd? ALUSrc: Immediate to ALU? MemtoReg: Write memory or ALU output? RegWrite: Write to regfile at all? MemRead: Read from Data Memory? MemWrite: Write to the Data Memory? Branch: Is it a branch intruction? ALUOp[1:0]: ALU control field Chapter 1 — Computer Abstractions and Technology — 32 R-Type Instruction Chapter 4 — The Processor — 33 R-Type: Control Signals RegDst ALUSrc MemtoReg RegWrite MemRead MemWrite Branch ALUOp[1:0] 1 (write to rd) 0 (No immediate) 0 (wrote not from memory) 1 (does write regfile) 0 (no memory read) 0 (no memory write) 0 (does write regfile) 10 (R-type ALU op) Chapter 1 — Computer Abstractions and Technology — 34 Load Instruction Chapter 4 — The Processor — 35 Load: Control Signals RegDst ALUSrc MemtoReg RegWrite MemRead MemWrite Branch ALUOp[1:0] 0 1 1 1 1 0 0 00 Chapter 1 — Computer Abstractions and Technology — 36 Store: Control Signals RegDst ALUSrc MemtoReg RegWrite MemRead MemWrite Branch ALUOp[1:0] X 1 X 0 0 1 0 00 Chapter 1 — Computer Abstractions and Technology — 37 Branch-on-Equal Instruction Chapter 4 — The Processor — 38 BEQ: Control Signals RegDst ALUSrc MemtoReg RegWrite MemRead MemWrite Branch ALUOp[1:0] X 0 X 0 0 0 1 01 Chapter 1 — Computer Abstractions and Technology — 39 Control Signal Setting What’re the control signal values for each instruction or instruction type? Inst RegDst ALUSrc MemtoReg RegWrite MemRead MemWrite Branc ALUO ALUO h p1 p0 R- 1 0 0 1 0 0 0 1 0 lw 0 1 1 1 1 0 0 0 0 sw X 1 X 0 0 1 0 0 0 beq X 0 X 0 0 0 1 0 1 Note: “R-” means R-format Chapter 1 — Computer Abstractions and Technology — 40 VHDL Notes How to program the control? entity control is port (op_code reg_dst alu_src mem_to_reg reg_write mem_read mem_write branch alu_op end control; : : : : : : : : : in out out out out out out out out m32_6bits; m32_1bit; m32_1bit; m32_1bit; m32_1bit; m32_1bit; m32_1bit; m32_1bit; m32_2bits); Chapter 1 — Computer Abstractions and Technology — 41 VHDL Notes architecture rom of control is subtype code_t is m32_vector(8 downto 0); type rom_t is array (0 to 63) of code_t; -- The ROM content for control signals signal rom : rom_t := ( 00 => "100100010", -- R-type 35 => "011110000", -- LW … -- More for other instructions others=>"000000000"); begin (reg_dst, alu_src, mem_to_reg, reg_write, mem_read, mem_write, branch, alu_op(1), alu_op(0)) <= rom(to_integer(unsigned(op_code))); end rom; Chapter 1 — Computer Abstractions and Technology — 42 Implementing Jumps Jump 2 address 31:26 Jump uses word address Update PC with concatenation of 25:0 Top 4 bits of old PC 26-bit jump address 00 Need an extra control signal decoded from opcode Chapter 4 — The Processor — 43 Datapath With Jumps Added Chapter 4 — The Processor — 44 Grading Scale Tentative grading scale A: 90, A-: 87 B+: 84, B: 80, B-: 75 C+: 70, C: 65, C-: 60 D: 50 There will be a bonus in lab projects Chapter 1 — Computer Abstractions and Technology — 45 Mini-Project B, Tentative Implement single-cycle processor (SCP). There will be three parts 1. Part 1, SCPv1: Implement the nineinstruction ISA plus the J instruction 2. Part 2, SCPv2a: Support all the instructions needed to run bubble sorting 3. Part 3, SCPv2b: Detailed modeling of data elements Chapter 1 — Computer Abstractions and Technology — 46 Mini-Project B Bonus part, SCPv3: Support all integer instructions on the green sheet, due in the last lab Some support files will be provided High-level modeling of Register File, ALU, Adder, to be used in Parts 1 and 2 Partial sample VHDL code will be provided Chapter 1 — Computer Abstractions and Technology — 47 Mini-Project B The CPU composition must be strongly structural Parts 1 and 2 may use behavior/dataflow modeling for data elements Part 3 must use detailed modeling for data elements – Reuse your VHDL code in the labs Chapter 1 — Computer Abstractions and Technology — 48 Extend Single-Cycle MIPS Consider the following instructions addi: add immediate sll: Shift left logic by a constant bne: branch if not equal jal: Jump and link jr: Jump register Chapter 1 — Computer Abstractions and Technology — 49 SCPv0: R-Format, LW/SW, BEQ Chapter 4 — The Processor — 50 SCPv1: R-Format, LW/SW, BEQ, J Chapter 4 — The Processor — 51 SCPv1: Control Signals What’re the control signal values for each instruction or instruction type? Inst RegDst ALU- Mem- Reg- Mem Mem Bran toReg Write Read Write ch Src ALU Op1 ALU Op0 Jum p R- 1 0 0 1 0 0 0 1 0 0 lw 0 1 1 1 1 0 0 0 0 0 sw X 1 X 0 0 1 0 0 0 0 beq X 0 X 0 0 0 1 0 1 0 j X X X 0 0 0 0 X X 1 Note: “R-” means R-format Chapter 1 — Computer Abstractions and Technology — 52 Extend the Single-Cycle Processor For each instruction, do we need 1. Any new or revised datapath element(s)? 2. Any new control signal(s)? Then revise, if necessary, 1. Datapath: Add new elements or revise existing ones, add new connections 2. Control Unit: Add/extend control signals, extend the truth table 3. ALU Control: Extend the truth table Chapter 1 — Computer Abstractions and Technology — 53 SCPv0 + ADDI addi rs, rt, immediate 001000 31:26 rs 25:21 rt immediate 20:16 15:0 R[rt] = R[rs]+SignExtImm Read register operands (only one is used) Sign extend the immediate (in parallel) Perform arithmetic/logical operation Write register result Chapter 1 — Computer Abstractions and Technology — 54 SCPv0 + ADDI What changes to this baseline? Chapter 1 — Computer Abstractions and Technology — 55 SCPv0 + ADDI Do we need new or revised datapath elements? Chapter 4 — The Processor — 56 SCPv0 + ADDI Do we need new or revised datapath elements? Do we need new control signal(s)? Inst RegDst ALUSrc MemtoReg RegWrite MemRead MemWrite Branc ALUO ALUO h p1 p0 R- 1 0 0 1 0 0 0 1 0 lw 0 1 1 1 1 0 0 0 0 sw X 1 X 0 0 1 0 0 0 beq X 0 X 0 0 0 1 0 1 addi Chapter 1 — Computer Abstractions and Technology — 57 SCPv0 + ADDI Like LW Inst Like R-format arithmetic I-format instruction Write to register[rt] Use add operation RegDst Write ALU result to register file ALUSrc MemtoReg RegWrite MemRead MemWrite Branc ALUO ALUO h p1 p0 R- 1 0 0 1 0 0 0 1 0 lw 0 1 1 1 1 0 0 0 0 sw X 1 X 0 0 1 0 0 0 beq X 0 X 0 0 0 1 0 1 addi 0 1 0 1 0 0 0 0 0 Chapter 1 — Computer Abstractions and Technology — 58 SCPv0 + SLL sll rd, rs, shamt 000000 31:26 rs 25:21 rt rd 20:16 15:11 shamt 10:6 000000 5:0 R[rd] = R[rt]<<shamt Read register operands (only one is used) Perform shift operation Write register result Note: sllv rd, rt, rs for shift left logic variable Chapter 1 — Computer Abstractions and Technology — 59 SCPv0 + SLL What changes to the datapath elements? Chapter 1 — Computer Abstractions and Technology — 60 SCPv0 + SLL ALU needs to do the shift operation ALU 1st input needs another source: shamt extended to 32-bit Chapter 1 — Computer Abstractions and Technology — 61 SCPv0 + SLL Add another source to the 1st input of ALU Add a Mux and ALUSrc1 control line Shamt: Instruction[10-6] 0: R[rs] 1: Shamt (sign-extended) Rename ALUSrc to ALUSrc2 Extend ALU control Add an ALU control code for SLL Chapter 1 — Computer Abstractions and Technology — 62 SCPv0 + SLL Extend ALU control: Choose a code of your choice (kkkk shown in the table) opcode ALUOp Operation funct ALU function ALU control lw 00 load word XXXXXX add 0010 sw 00 store word XXXXXX add 0010 beq 01 branch equal XXXXXX subtract 0110 R-type 10 add 100000 add 0010 subtract 100010 subtract 0110 AND 100100 AND 0000 OR 100101 OR 0001 set-on-less-than 101010 set-on-less-than 0111 shift-left-logic 000000 shift-left-logic kkkk Chapter 4 — The Processor — 63 SCPv0 + SLL Inst RegDst ALUSrc MemtoReg RegWrite MemRead MemWrite Branc h ALUO p R- 1 0 0 1 0 0 0 1 0 lw 0 1 1 1 1 0 0 0 0 sw X 1 X 0 0 1 0 0 0 beq X 0 X 0 0 0 1 0 1 Inst RegDst ALU Src1 R- 1 0 0 0 1 0 0 0 1 0 lw 0 0 1 1 1 1 0 0 0 0 sw X 0 1 X 0 0 1 0 0 0 beq X 0 0 X 0 0 0 1 0 1 sll 1 1 0 0Chapter11 — Computer 0 0 0 and 1 0 Abstractions Technology — 64 sll ALU- Mem- Reg- Mem Mem Bran Src2 toReg Write ch Read Write ALU Op SCPv0 + BNE bne rs, rt, label 000101 31:26 25:21 rt offset 20:16 15:0 PC = (R[Rs]!=R[rt]) ? PC+4+(SignExtImm<<2) : PC+4 Read register operands Compare operands rs Use ALU, subtract and check Zero output Calculate target address Sign-extend displacement Shift left 2 places (word displacement) Add to PC + 4 Already calculated by instruction fetch Chapter 1 — Computer Abstractions and Technology — 65 SCPv0 + BNE Make what changes to the datapath? Chapter 4 — The Processor — 66 SCPv0 + BNE Extend Branch to two bits 10: Branch-Equal 11: Branch-Not-Equal Replace the AND gate with the following logic Can use a different truth table Branch Zero 1 1 0 1 1 0 otherwise Branch taken? 1 1 0 Chapter 1 — Computer Abstractions and Technology — 67 SCPv0 + BNE Inst RegDst ALUSrc MemtoReg RegWrite MemRead MemWrite Branc h ALUO p R- 1 0 0 1 0 0 0 1 0 lw 0 1 1 1 1 0 0 0 0 sw X 1 X 0 0 1 0 0 0 beq X 0 X 0 0 0 1 0 1 ALUSrc MemtoReg RegWrite MemRead MemWrite Branc h ALUO p bne Inst RegDst R- 1 0 0 1 0 0 0 0 1 0 lw 0 1 1 1 1 0 0 0 0 0 sw X 1 X 0 0 1 0 0 0 0 beq X 0 X 0 0 0 1 0 0 1 bne X 0 X 0 0 0 1 1 0 1 Chapter 1 — Computer Abstractions and Technology — 68 SCPv1 + JAL jal target 000011 31:26 address 25:0 PC = JumpAddr R[31] = PC+4 Jump uses word address Update PC with JumpAddr: concatenation of top 4 bits of old PC, 26-bit jump address, and 00 (called pseudo-direct) Save PC+4 to $ra Chapter 1 — Computer Abstractions and Technology — 69 SCPv1 + JAL Make what changes to the datapath? Chapter 4 — The Processor — 70